rpau / javalang-compiler

Java compiler elements (symbol and type tables) to perform code semantic analysis
GNU Lesser General Public License v3.0
10 stars 4 forks source link

200% speedup for large project - cache class names of jar files - https://github.com/rpau/javalang-compiler/issues/25 #26

Closed cal101 closed 7 years ago

cal101 commented 7 years ago

Cache class names of jar files.

rpau commented 7 years ago

I get the idea.

Thank you so much, but there are tests that now fail, right?

There is a Null pointer exception. Any idea? Do you need my help?

cal101 commented 7 years ago

I take a look. Didn't noticed.

cal101 commented 7 years ago

Please take a look. I can repeat the problem but don't understand why my simple caching leads to this. Does javalang-compiler create or change jar files in a single run? If yes the cache must check for changes to the underlying jar but I didn't expect that. Maybe it should do that always just in case?

rpau commented 7 years ago

Hi @cal101

This attached patch solves the problem. The cause was that the cache is not considering the folder that is asked to analyze. Apply it and commit again.

diff.txt

cal101 commented 7 years ago

I missed that directory was used as a filter. Thanks!

cal101 commented 7 years ago

But... the diff also means, that more scans of the jar may be done than necessary. Will think about it.

rpau commented 7 years ago

Yes, but measure the amount of memory needed to store every single class name

rpau commented 7 years ago

I have deployed a new version on the official maven repo with these changes and the pmd fixings. It will be automatically download by walkmod during today or tomorrow.

cal101 commented 7 years ago

I checked out the pmd plugin code and give it a try to see whats left.

Regarding class name memory.

My case has 282 jars. 77310 class names consisting of 4336819 chars.

Lets say 80k class names, 5 Mio chars: 80k 32 + 5 Mio 2 = 13 MB

I have to measure the speedup again to compare but 13 MB is peanuts. And if the project is really big the savings should be even bigger.

rpau commented 7 years ago

nice!

cal101 commented 7 years ago

~~Doing only full jar scans and caching full and filtered class name lists gives another 10% improvement with my test case. (about 6 minutes now with semantic analysis for all classes and the rawclasspath caching patch enabled) But another profiled value bothers me: more than 50% of runtime is used to read name entries from jar files now.~~

java.util.jar.JarFile$JarEntryIterator.nextElement() 53.89448 185,682 ms (53.9%) 185,682 ms

If that is cached it's another 100% boost. What do you think? Could ".walkmod" be used for this? Maybe it's better to put it in "target" so clean cleans it, too.

Something was wrong with my analysis. Have to re-check.