The success of interning and compressing needs to be monitored.
Trimming has to be done incrementally - no "stop the world" garbage collection.
Suggestion: When trimming is needed set up a second string-interner. Use the sign of StringIDs to distinguish between interners, allowing two interners to coexist.
Incrementally re-intern all properties of a column, then discard the old string interner.
Since trimming will take place after ingesting data, we can likely optimize compression by training the compressor on available data before interning.
See design document.
The success of interning and compressing needs to be monitored.
Trimming has to be done incrementally - no "stop the world" garbage collection.
Suggestion: When trimming is needed set up a second string-interner. Use the sign of StringIDs to distinguish between interners, allowing two interners to coexist.
Incrementally re-intern all properties of a column, then discard the old string interner.
Since trimming will take place after ingesting data, we can likely optimize compression by training the compressor on available data before interning.