openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.15k stars 321 forks source link

Extremely huge memory occupation (memory leak?) #3018

Closed mariofusco closed 1 year ago

mariofusco commented 1 year ago

We are using openrewrite on the drools project, but it requires at least 16Gb of heap to run successfully and I believe this is not acceptable. I tried to figure out where it is consuming the biggest part of memory and found that there is an ArrayList that is also a GC root (I guess it is a static field somewhere) that retains thousands of instances of the class org.openrewrite.java.tree.J$CompilationUnit taking more than the 65% of the whole heap.

image

timtebeek commented 1 year ago

Thanks again for reporting this here! More details can be found in the associated Slack thread.

Hi guys! We are using openrewrite to migrate our codebase to jakarta namespace/quarkus-3 here. Until some time ago we used 4.36.0 version of plugin: it worked, but some files (nameley, *.java inside "resources" folder) where not processed. Then, we switched to 4.41.1 and 4.42.0, but we start notice OutOfMemory issues. I had to set -Xmx16192m to get to the end. Our yml file contains a certain number of "nested" recipee. What is the best way to manage that (i.e. it is more efficient to create one single recipee that contains all the called one ?) Could you give some advice on that OOM issue ?

The versions refer to the Maven plugin; Here's a diff of the changes since then: https://github.com/openrewrite/rewrite-maven-plugin/compare/v4.36.0...v4.42.0 And within OpenRewrite/rewrite itself: https://github.com/openrewrite/rewrite/compare/v7.32.0...v7.38.0

Hi! The migration is fired with a shell command that, behind the scene, execute that. From the heap dump we notice that there is an arraylist of > 9000 org.openrewrite.java.tree.J$CompilationUnit instances. It is a "static list", so the GC can not reclaim it. This list occupy ~75% of the memory space https://github.com/kiegroup/drools/blob/8c534fe8201ee0cd73fc6652e7fd000505675043/.ci/environments/quarkus-3/before.sh

mariofusco commented 1 year ago

I believe that the huge memory requirements that we are experiencing is actually something caused by the maven mojo and in particular here. There it is trying to load all ASTs of all the source files in the project in the same list, which clearly cannot work for large projects. I'm pretty sure that this is the ArrayList that I saw in the heap dump report I pasted above: it goes OOM while it is trying to fill it up and it is a GC root in the moment because it is still a local variable on the stack.

knutwannheden commented 1 year ago

@mariofusco Can you tell us exactly which recipes it is you are running in the Drools project, so that we can use it as a reproducer and analyze the heap? I suppose the easiest would be if you could share the exact command you are using the run OpenRewrite. Thanks!

knutwannheden commented 1 year ago

I think I may have figured it out. I saw that you have a rewrite profile in your pom.xml and I can reproduce the problem using mvn rewrite:dryRun -Prewrite.

rpau commented 1 year ago

@mariofusco We have improved the memory characteristics in rewrite 8.0 as much as possible, but with some improvements only available through the proprietary serialization available through the platform, which is free to use for OSS. Those improvements are specially needed for monorepos.

The good news is that drools repository is one of the open source repositories that are available in the Moderne platform that we automatically ingest.

We will be very happy to support you in case you want to use the platform for the Drools repository.

Having said that, we have internally concluded that we can close this ticket because we do not need an special action after rewrite 8 for now. We can re-open it any time if it is needed.