In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)
So using factors instead of characters in a data.frame won't reduce memory size nor improve the speed:
print(object.size(c("a", rep("b", 100))), units = "b")
# 968 bytes
print(object.size(as.factor(c("a", rep("b", 100)))), units = "b")
# 968 bytes
It happens the same with data.frames. So it should be easier to just use characters instead of factors back into the BaseSet.
From factors help page:
So using factors instead of characters in a data.frame won't reduce memory size nor improve the speed:
It happens the same with data.frames. So it should be easier to just use characters instead of factors back into the BaseSet.