My main motivation here is that in the future, this should allow us to have some kind of enumerated representation of feature names which serializes compactly; once we have that, it probably makes sense to optionally/always forceToDisk the trainingData to get a compact representation we can scan over N times when expanding.
My main motivation here is that in the future, this should allow us to have some kind of enumerated representation of feature names which serializes compactly; once we have that, it probably makes sense to optionally/always forceToDisk the trainingData to get a compact representation we can scan over N times when expanding.
cc @snoble @colinmarc @danielhfrank