jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.87k stars 873 forks source link

large, high-throughput, memory-mapped queue #798

Open domenkosir opened 7 years ago

domenkosir commented 7 years ago

This is my use case:

I'm reading ~1kB items from several sources and putting them into a MapDb queue (I'm using mapdb-2.0-beta13). In the same process, a different thread takes items from the queue and sends them to a service via HTTP.

Sometimes, the HTTP service becomes unresponsive. When this happens, no items are dequeued for a while and the size of the queue increases. It's a high-throughput scenario, so the queue size can easily exceed the amount of RAM available.

I've experimented with fileDB & fileMmapEnable but I'm having problems setting up MapDb according to my needs:

  1. I don't want the queue size to be limited by RAM but only by disk space. That's why I'm using fileDB.
  2. When many items are enqueued and then dequeued, I saw that the file size does not decrease. When I add a periodic call to db.compact, the space is reclaimed but I see a HUGE performance drop. Is there another way to reclaim the disk space of dequeued items?
  3. I want to avoid persisting data to the disk when the queue is relatively small, e.g. up to 1GB or 1M items. On the other hand, when the queue size increases dramatically, I want it to be partially persisted. I feel that memory-mapped files can be very useful in this scenario. However, when I put several millions of 1kB items into the queue I get OutOfMemory exceptions. Is MapDb able to only map the front and the end of the queue to memory? If not, is there another way for efficiently handling queues larger that available RAM?
  4. I feel that the MapDb lib should be able to handle commits better than me, so I'd like to avoid manually calling db.commit, if that's possible. Using transactionDisable I sometimes get data corruption when my process exits unexpectedly. In your experience, what is the best way to avoid data corruption?

Also, is queue support coming to MapDb 3.0 and when can we expect it?

jankotek commented 7 years ago

Space could be reclaimed by deleting old file and creating new, it is better than compaction in many cases.

Queues should be added in a few months. I will keep this issue opened as it describes required features well.

doggie1989 commented 5 years ago

+1,ConcurrentQueue!!!