peter-lawrey / Java-Chronicle

Java Indexed Record Chronicle
1.22k stars 193 forks source link

Holding buffers in list #16

Closed jknehr closed 11 years ago

jknehr commented 11 years ago

Hi -

Does the chronicle hold all mapped buffers in memory until the chronicle gets closed, but otherwise keeps references to them around all the time?

I'm referring specifically to these Lists.

    private final List<MappedByteBuffer> indexBuffers = new ArrayList<MappedByteBuffer>();
    private final List<MappedByteBuffer> dataBuffers = new ArrayList<MappedByteBuffer>();

The reason I ask is because I'm curious if this could create potential problems if a very large file is being read with, for example, a small configured block size. This would result is a large amount of buffers being allocated. If the underlying OS has an upper limit for the number of these that can be allocated, it would result in an "out of memory" exception once this limit was reached.

My question is, by holding onto these buffers in the list, does this prevent the OS from cleaning up these allocated maps in memory?

Thanks for the help!

peter-lawrey commented 11 years ago

Unfortunately Java doesn't unmap the MappedByteBuffer so discarding it isn't useful. There are limitations on the number of mapping you can have and I have seen this to be around 32K on some system. I imagine its OS dependent.

If you have blocks of 64 MB and 32 K of them you can map 2 TB, if you use 1 GB buffers you can map up to 32 TB.

jknehr commented 11 years ago

I find it difficult to understand why java would hold onto these buffers until the file channel is closed, because it would then just prevent the OS from cleaning up. If it weren't for my process being restarted every day, it would forever hold onto these, just indefinitely mapping files.

Anyways, I originally had 32KB set for my buffer size and running with an embedded cassandra instance made everything blow up pretty quickly and took me awhile to figure out why. I've since set my buffer size to 1GB and things have been running much better.

jknehr commented 11 years ago

Fwiw--http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html

"A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected."

mingfang commented 11 years ago

Peter I found an entry in Stackoverflow you've written that suggest that mmap files can be cleaned up. http://stackoverflow.com/questions/8553158/prevent-outofmemory-when-using-java-nio-mappedbytebuffer

Is this a technique that Chronicle can use too?

peter-lawrey commented 11 years ago

This technique is used in Java Chronicle. Unfortunately I didn't know when I wrote this (the library and the answer to the question) that while an "unmapper" Cleaner is called, it doesn't do anything because its never safe to unmap a memory region AFAIK this is an OS limitation. i.e. the heap for a MappedByteBuffer will be cleaned up but the data associated with it never will. There is a limit to how many mapping a JVM can have (based on the OS) so you want to keep them to a reasonable minimum. On some Centos systems I test, this was about 32K mappings.

kaxu commented 11 years ago

共和国

mingfang commented 11 years ago

Peter, do you have any plans to change this? We currently maintain an internal port of Java-Chronicle just to workaround this problem. Our port simply remove the list of buffers, keeping only one buffer. This buffer effectively act as a sliding window into the large file. It has worked well for us but I know want to have to maintain it. It would be great if your official release has support for this.

Btw, the reason we have to do this is because on old Windows and small Linux VMs, the memory map address space is very small. Without our modification our system crashes with out of address space error.

peter-lawrey commented 11 years ago

If you can send me the modifications I am more than happy to have a look at them.

On 2 February 2013 22:34, mingfang notifications@github.com wrote:

Peter, do you have any plans to change this? We currently maintain an internal port of Java-Chronicle just to workaround this problem. Our port simply remove the list of buffers, keeping only one buffer. This buffer effectively act as a sliding window into the large file. It has worked well for us but I know want to have to maintain it. It would be great if your official release has support for this.

Btw, the reason we have to do this is because on old Windows and small Linux VMs, the memory map address space is very small. Without our modification our system crashes with out of address space error.

— Reply to this email directly or view it on GitHubhttps://github.com/peter-lawrey/Java-Chronicle/issues/16#issuecomment-13038605.

peter-lawrey commented 11 years ago

I have changed the packages to support publishing to maven central but if you can update a recent fork of the code you can push the changes to me for review/acceptance.

peter-lawrey commented 11 years ago

Fixed. See Issue #18 for more details.