Too much memory allocation/poor performance in long run tasty execution

jneira commented 5 years ago

When running tests or benchmarks eta executables the take excessive heap memory.

Description

Detected when running the dhall-eta test suite in windows and circleci (see https://github.com/typelead/eta/issues/915#issuecomment-448935276 )

Expected Behavior

The proccess should use less memory (not sure about how many)

Actual Behavior

The process takes up to 2.5 Gb

Possible Fix

See @rahulmutt comment: https://github.com/typelead/eta/issues/915#issuecomment-448969337

Steps to Reproduce

Run dhall-eta test suite in local or circleci

Context

Setup test suite for dhall-eta

Your Environment

circleci with eta master docker image
local: windows with eta and etlas built from master
Link to your project: https://github.com/eta-lang/dhall-eta

jneira commented 5 years ago

Hi, i've collected some snapshots, made with jvisualvm dhall.eta.tasty.profiles.zip :

CPU profiling
- dhall.eta.tasty.nps with output in the console
- dhall.eta.tasty.pipe.to.file.nps with the output redirected to a file
dhall.eta.tasty.heap.nps: memory profiling

Taking a quick look:

The diff between the console and the file output is brutal. When redirecting a lot of threads are spawned and the execution is so much quickier. With console output only one thread is used.
In memory snapshot the class with more allocation is eta.runtime.thunk.SelectorPUpd

jneira commented 5 years ago

So passing --hide-successes to tasty execution makes the test suite run a way faster: https://circleci.com/gh/eta-lang/dhall-eta/75

rahulmutt commented 5 years ago

Thanks for the observation. That surely means there's a memory leak to investigate here since holding on to less info made it run faster.

jneira commented 5 years ago

Yeah, although memory usage and gc overhead is similar between console and redirecting to file. Not sure if there is a memory leak cause once the process take the maximum memory possible the usage is pretty stable

monitor

dhall.eta.tasty.pipe.to.file.heap.nps.zip

rahulmutt commented 5 years ago

I have a suspicion that this has to do with native memory allocation and not the heap memory. Can you check that as well? I think the MemoryManager isn't freeing as often or as well as it should leading to native heap growing endlessly.

jneira commented 5 years ago

In fact, not all executions to console are equal, i've taken another one and it was similar to the redirect one :thinking:

monitor console

I've monitored native memory taking some samples as suggested in https://stackoverflow.com/a/30941584/49554:

native.txt

jneira commented 5 years ago

Another file with native memory samples including time: native.txt

rahulmutt commented 5 years ago

Hmm well I was wrong about that - it looks like the native memory usage increases very gradually and in amounts < 1MB. Btw you can view native memory in VisualVM by enabling the "VisualVM-BufferMonitor" plugin.

jneira commented 5 years ago

Wow, thanks for the tip

nightscape commented 5 years ago

Another thing that might be interesting is if the JVM is taking the memory just because it can, or if it really needs it. You could test that by pressing the "Perform GC" button when it reaches the peak and check how much it drops.

nightscape commented 5 years ago

Another helpful tool is Eclipse MAT. MAT operates on JVM memory dumps and you can do all sorts of analyses, e.g. find out which object types consume how much memory, find out by which instances another instance is referenced, etc.

rahulmutt commented 5 years ago

@jneira If the largest number of classes you see is eta.runtime.thunk.SelectorPUpd then this could mean it's an issue of the Eta runtime's lack of selector thunk optimization.

We probably need to implement this: https://github.com/typelead/eta/issues/517

A simple way to implement it is to spawn a thread when the runtime system initializes and just have it traverse the weak references to the selector thunks periodically to see if they can be reduced.

More details on how this leak occurs here: https://homepages.inf.ed.ac.uk/wadler/papers/leak/leak.ps.gz

jneira commented 5 years ago

@nightscape thanks for the tip! i am afraid that doing a gc does not free any significant memory so the 1500 Mb max seems to be needed

rahulmutt commented 5 years ago

Some progress updates:

I've implemented a basic form of selector thunk optimization via StgContext-local weak references. The solution doesn't involve multiple threads and automatically bounds the number of weak references created to avoid causing extra GC overhead. It appears showing better memory characteristics than before, but it can still be better. The next step is to short out thunk indirections to let go of even more memory.

I've been using this code to test the optimization (inspired by the Wadler paper):
```
import System.IO
import System.Directory
import Data.Function

main :: IO ()
main = do
let file = "hello"
    file2 = "hello2"
contents <- readFile file
let insertb xs = before ++ "b" ++ after
      where (before, after) = break (== 'b') xs
writeFile file2 (insertb contents)
removeFile file2
```
Where hello is a file with a large number of characters other than 'b'.
I'm also going to implement general thunk clearing that is thread-safe so that I can re-enable it by default. Without thunk clearing, severe space leaks can happen so it is absolutely essential that it be done. It can be enabled even now with -Deta.rts.clearThunks=true and in fact I had to do so to even verify that the selector thunk optimization was working.

It will probably take a couple more days to implement what I mentioned above.

rahulmutt commented 5 years ago

@jneira I've implemented both selector thunk optimization and re-enabled thunk clearing because it is now thread-safe (verified by running eta-benchmarks which failed with spurious NPEs before b/c of thunk clearing and now runs smoothly).

Wait until the docker image for the current master is ready and go ahead an re-run the CircleCI build for dhall-eta and see how it fares.

jneira commented 5 years ago

In the bright side, the execution in local had a simply amazing improvement in both memory and time, fantastic work @rahulmutt:

So the main goal had been achieved!

But i am afraid the build in circleci hangs anyway so maybe it is caused for another reason. In my windows test the openBinaryFile: resource busy (file is locked) persist.

jneira commented 5 years ago

I am going to close this one cause the memory allocation is resolved

typelead / eta