man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.46k stars 93 forks source link

Reduce memory footprint of `prune_previous_versions` #1643

Closed alexowens90 closed 2 months ago

alexowens90 commented 3 months ago

https://github.com/man-group/ArcticDB/blob/master/cpp/arcticdb/util/key_utils.hpp#L88-L115 Currently, generates a vector of all data keys that can potentially be deleted, then collapses that vector to a hash set. In the case where a symbol is being appended to constantly, this results in a large amount of duplication in the vector. Materialising AtomKeys is also generally memory hungry, and not needed in this case.