questdb / questdb

QuestDB is an open source time-series database for fast ingest and SQL queries
https://questdb.io
Apache License 2.0
14.47k stars 1.17k forks source link

Spill data to disk (disk spill) support for FastMap #3848

Closed puzpuzpuz closed 4 months ago

puzpuzpuz commented 1 year ago

Is your feature request related to a problem?

Disclamer. While contributions on this one are very much appreciated, this is not an easy frag.

FastMap is QuestDB's native memory hash table used for hash joins, as well as for storing analytic functions state in GROUP BY/SAMPLE BY queries. Refer to the class' javadoc for the high-level design overview of the hash table.

Apart from other aspects, FastMap uses a chunk of native memory as a grow-only heap that stores key-value pairs. Such design has a few advantages, among which disk spill support friendliness. Disk spill is nothing more but an ability to store the data structure on disk when it grows too large to fit into RAM. It's not something unique and can be met in some databases.

In our case, FastMap could be modified to migrate from "ordinary" native memory (i.e. anonymous mmapped memory) to mmapped memory (see io.questdb.cairo.vm package) when it reaches certain heap size threshold. Other than that, we should make sure to clean up the temporary files when we no longer need them and on start-up.

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Additional context.

No response

SaiSohith commented 1 year ago

Hey I dont know single thing about this but want to work on it can you provide any additional links from which i can get an idea and get started.

puzpuzpuz commented 1 year ago

FastMap's source code would be the first link - already in the issue description. Besides that, you should check our contributor's guide.

SaiSohith commented 1 year ago

Thank you Will check it out.

fool1280 commented 1 year ago

Would love to work on this :) Let me read up the documentation, and I’ll reach out immediately for any question.

hilmialf commented 1 year ago

Hi, is there still any chance to contribute on this? I would love to work on this so let me quickly check the documentation first :)

puzpuzpuz commented 1 year ago

@hilmialf we don't have an open PR yet, but someone may have started working on this one considering the previous messages here.

puzpuzpuz commented 4 months ago

This feature got more challenging since recently we introduced many specialized hash table implementations. So, I'm going to close it.