piranhacloud / piranha

Piranha - a modern cloud runtime
https://piranha.cloud
BSD 3-Clause "New" or "Revised" License
193 stars 20 forks source link

Optimize loading of multirelease JAR files in a WAR #3669

Open OndroMih opened 6 months ago

OndroMih commented 6 months ago

Hi, I noticed that when a WAR file contains a multirelease JAR file, it takes an extremely long time to deploy even a simple application.

Here's an example application: myapp.war.zip (source code here: jsf-hello-world.zip

To reproduce, just run the Piranha Web Profile distribution with the app, like this:

java -jar piranha-dist-webprofile-24.2.0.jar --war-file myapp.war

I took a few thread dumps during deployment: threaddumps.zip

On my computer, it took 96 seconds (1.5 minutes) to deploy the reproducer app (actually, deployment failed, but that's because of a classloading issue in my app with javax/jakarta package prefix, that's not very relevant in this case)

OndroMih commented 6 months ago

According to the thread dumps, the source of inefficiency seems to be the MultiReleaseResource class. In the versionedEntry method, it iterates over a list of Java versions, and for each version it tries to read a versioned resource. This happens for every class in any multirelease JAR, because this action is initiated by the Annotation Scanner, which attempts to load all classes in the WAR file.

Most of the time, only one or a few classes have a versioned resource. In my reproducer, the only multirelase JAR is bytebuddy, which only contains module-info.class for Java 9+, no other class is versioned. The current behavior tries to load a class for all Java versions from 9 to 21, and then in the usual location, if it doesn't find any class for a specific version. This means, that for every class in a multirelease JAR, it attempts to load the class 12 times, it doesn't find it, and then it loads it on the 13th attempt. On future Java versions, it will get worse and worse.

I suggest that the MultiReleaseResource, on the first attempt to load a resource, scans the contents of the META-INF/versions folder and stores the list of resources for each Java version into memory. And then it would attempt to load a versioned resource only if it exists in the JAR file. It should then know exactly which resource to load, for a specific version or at the default location, and would attempt to load a resource only once.

OndroMih commented 6 months ago

Another optimization is to unpack JAR files into a temp folder and then load resources from that folder. This would optimize time to load resources for any JAR file, even if it's not a multirelease JAR.

Overall, the deployment times are not very nice. Even if I turned the bytebuddy JAR file in my reproducer to a non-multirelease one, it still took about 30 seconds to deploy on my computer (actually, deployment failed because of the javax/jakarta issue in my app)

OndroMih commented 6 months ago

Another optimization, again for any JAR file, could be to use multiple (virtual) threads to read from a JAR file. Each JAR file can be read by a different thread, and even different resources from the same JAR file could be read by a different thread, if each thread opens its own JarFile pointing to the same JAR file (JarFile uses synchronized access so it doesn't help if multiple threads use the same JarFile instance, but opening multiple JarFile instances against the same file for reading is not a problem).

We could use virtual threads, then we could create a virtual thread for each classpath resource. However, I'm not sure if this is an optimization, or it would be slower because it would open a JarFile for each classpath resource.

mnriem commented 6 months ago

@Thihup If you have bandwidth, please have a look. @OndroMih Feel free to come up with a fix also.

Thihup commented 6 months ago

@OndroMih, Thank you for the investigation!

When I implemented the feature, I was already aware of the performance issue (see https://github.com/piranhacloud/piranha/pull/1507#issuecomment-782631519).

If I recall correctly, fixing it was a bit more complex because we don't use a Jar file; instead, we utilize the Resource interface. This allows us to use the MultiReleaseResource as a wrapper around other Resource implementations.

A cache would probably suffice, since, most of the time, there are only a few multi-release classes in a Jar/Resource.

Regarding the use of Virtual Threads, I believe it pins the current thread to read a file, so I'm not certain it would optimize the process. However, my understanding of this topic could be incorrect.

@mnriem Currently, I'm using Windows, and I need to check if I can still compile the project. I recall encountering some failing tests. First, I need to fix my setup before fixing this issue.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 170 days with no activity. Remove stale label or comment or this will be closed in 10 days

OndroMih commented 2 weeks ago

I didn't have time to investigate how to fix this. I'd like to get to this when I have some time.