ropensci / BaseSet

Provides classes for working with sets
https://docs.ropensci.org/BaseSet
Other
10 stars 3 forks source link

About using factors #22

Closed llrs closed 5 years ago

llrs commented 5 years ago

From factors help page:

In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)

So using factors instead of characters in a data.frame won't reduce memory size nor improve the speed:

print(object.size(c("a", rep("b", 100))), units = "b")
# 968 bytes
print(object.size(as.factor(c("a", rep("b", 100)))), units = "b")
# 968 bytes

It happens the same with data.frames. So it should be easier to just use characters instead of factors back into the BaseSet.