tjake / Jlama

Jlama is a modern LLM inference engine for Java
Apache License 2.0
609 stars 57 forks source link

Bug: Resource leak (java.io.RandomAccessFile) in AbstractModel.embed #97

Open Jozurf opened 2 hours ago

Jozurf commented 2 hours ago

Problem

When loading an embedding model, I noticed that calling model.embed multiple times eventually results in a java.io.FileNotFoundException due to "too many open files." The embed method in AbstractModel generates a new randomUUID as a session ID for each call, creating a new entry in kvBufferCache with a new KvBuffer. While each KvBuffer instance is created using a try-with-resources statement, the close() method of the KvBuffer class is empty. This means resources allocated within that KvBuffer in the KvBufferCache are not released. Each KvBuffer contains a 2D array of KvBufferPage, and since KvBufferPage is created (specifically at getTensorForPosition in KvBuffer) during the embedding but not closed, each KvBufferPage holds on to a RandomAccessFile and is never released. Here is the stack trace that I encountered.

... java.io.IOError: java.io.FileNotFoundException: /var/folders/7c/gjm0m1gs58s1xbw0j0896cb00000gp/T/11320173981426479037/1eb869a2-3b91-4aa4-846b-8d57e3cbc0e7-L1C19.page (Too many open files)
        at com.github.tjake.jlama.tensor.KvBufferCache$KvBufferPage.<init>(KvBufferCache.java:153)
        at com.github.tjake.jlama.tensor.KvBufferCache$KvBuffer.getTensorForPosition(KvBufferCache.java:284)
        at com.github.tjake.jlama.tensor.KvBufferCache$KvBuffer.getKeyTensorForPosition(KvBufferCache.java:268)
        at com.github.tjake.jlama.model.CausalSelfAttention.forward(CausalSelfAttention.java:196)
        at com.github.tjake.jlama.model.TransformerBlock.forward(TransformerBlock.java:173)
        at com.github.tjake.jlama.model.AbstractModel.forward(AbstractModel.java:293)
        at com.github.tjake.jlama.model.AbstractModel.batchForward(AbstractModel.java:280)
        at com.github.tjake.jlama.model.AbstractModel.batchForward(AbstractModel.java:270)
        at com.github.tjake.jlama.model.AbstractModel.embed(AbstractModel.java:311)
        ...
Caused by: java.io.FileNotFoundException: /var/folders/7c/gjm0m1gs58s1xbw0j0896cb00000gp/T/11320173981426479037/1eb869a2-3b91-4aa4-846b-8d57e3cbc0e7-L1C19.page (Too many open files)
        at java.base/java.io.RandomAccessFile.open0(Native Method)
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:356)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:273)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:223)
        at com.github.tjake.jlama.tensor.KvBufferCache$KvBufferPage.<init>(KvBufferCache.java:125)
        ... 17 more

Solution

I suggest closing each KvBufferPage in the close() method of KvBuffer.

@Override
public void close() {
    for (KvBufferPage[] layerPages : pages) {
        if (layerPages != null) {
            for (KvBufferPage page : layerPages) {
                if (page != null) {
                    try {
                        page.close();
                    } catch (IOException e) {
                        // error message
                    }
                }
            }
        }
    }
}
Jozurf commented 2 hours ago

On the topic of KvBufferCache, is there a limit to how big the kvBufferCache can grow? It creates a new entry in its map each time AbstractModel.embed is called based on current implementation and I cant find a place where we are removing entries. If there exists an application that calls AbstractModel.embed() a large number of times (e.g. 1 billion), then we might cause a slowdown because of the number of entries the kvBufferCache has. Is there a quick solution for that?