wg / scrypt

Java implementation of scrypt
Apache License 2.0
429 stars 145 forks source link

Memory is not released after hashing, but only when JVM exits #54

Open oehme opened 3 years ago

oehme commented 3 years ago

I noticed that a customer's application increased in memory usage every time some users logged in and the memory usage would not go down again even when everything was quiet. I profiled using async-profiler and found that the memory was being allocated by Scrypt. I managed to reduce it to the following reproducer:

public static void main(String[] args) throws InterruptedException {
    int NUM_THREADS = 10;
    int HASHES_PER_THREAD = 10;
    for (int i = 0; i < NUM_THREADS; i++) {
        new Thread(() -> {
            for (int j = 0; j < HASHES_PER_THREAD; j++) {
                //High memory usage during this loop is understandable
                //since we're actively using the native buffers
                SCryptUtil.scrypt("pwd", 16384, 8, 1);
            }
            //Once we're done, memory should be freed
            //However, it stays allocated until the JVM exists
            try {
                Thread.sleep(30000);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }).start();
    }
    Thread.sleep(30000);
}

Just run this and watch the process' memory usage. It spikes up when the threads invoke scrypt and never goes down again. It gets worse the more threads you have. The hashes per thread also seem to play a role, but only to a certain extent. If I set that parameter to 1, I get ~50MB memory usage. If I set it to 10, I get 500MB, but setting it to 100 does not further increase it.

I played around with the parameters a bit and if I increase N, the problem goes away. E.g. with 2^15 the memory usage spike up and then go down again. I'm afraid I don't know enough C to be of assistance in figuring out why :)

I ran the reproducer on Linux 5.4.0-72 if that helps.

oehme commented 3 years ago

I'd also be interested why N is called the "CPU cost parameter" when it affects memory just as much.