quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.4k stars 2.57k forks source link

Reading manifest from jar in jar-based `PathTreeWithManifest` is suboptimal #41395

Open gsmet opened 3 weeks ago

gsmet commented 3 weeks ago

I will probably have a look at this soon but I wanted to log my findings. /cc @aloubyansky

(This is part of my overall work on class loader leaking even if it has nothing to do with it)

in ArchivePathTree, we have this method:

    @Override
    protected <T> T apply(String relativePath, Function<PathVisit, T> func, boolean manifestEnabled) {
        ensureResourcePath(relativePath);
        if (!PathFilter.isVisible(pathFilter, relativePath)) {
            return func.apply(null);
        }
        if (manifestEnabled) {
            relativePath = toMultiReleaseRelativePath(relativePath);
        }
        try (FileSystem fs = openFs()) {
            for (Path root : fs.getRootDirectories()) {
                final Path path = root.resolve(relativePath);
                if (!Files.exists(path)) {
                    continue;
                }
                return PathTreeVisit.process(archive, root, path, pathFilter, func);
            }
        } catch (IOException e) {
            throw new UncheckedIOException("Failed to read " + archive, e);
        }
        return func.apply(null);
    }

toMultiReleaseRelativePath(relativePath); will actually read the Manifest (I have a patch to avoid doing that for META-INF entries but in the end we will read the Manifest at some point).

Reading the manifest is done by using an InputStream that will in the end use java.util.zip.Inflater to read the jar. From my reports, it actually takes quite a lot of memory to do that (the quarkus-ide-launcher jar probably doesn't help).

Given we actually open a zip fs at the line below, we could actually use this zip fs to read the manifest and my guess is that it would be a bit more optimal.

Not sure yet how to rework the code to be able to do that without messing things up too much in the hierarchy.

@aloubyansky thoughts?

See the stack here:

Screenshot from 2024-06-24 18-16-28

dmlloyd commented 3 weeks ago

I believe that the zipfs would also use an inflater, right? In fact, isn't that what the stack trace shows?

gsmet commented 3 weeks ago

No, the stracktrace is showing this call:

https://github.com/quarkusio/quarkus/blob/32d0c2dd4d935149c71f6e81524fb3a46fe89515/independent-projects/bootstrap/app-model/src/main/java/io/quarkus/paths/PathTreeWithManifest.java#L213-L214

gsmet commented 3 weeks ago

So we do this ^ to read the manifest (the manifest is cached but we have to read it once), and then we open the zipfs to read the actual file.

aloubyansky commented 3 weeks ago

@gsmet you want to read the manifest in a different way or avoid reading it? We have to read it to properly support multi-release content.

aloubyansky commented 3 weeks ago

There should also be API to not consult manifest entries.

dmlloyd commented 3 weeks ago

One simple way we could avoid inflating things is to suggest/encourage/provide a default for users to not use compression on their JARs.

gsmet commented 3 weeks ago

I should have been more precise: given we open a Zip FS to actually read the entry, I think we should use this same Zip FS to read the manifest and not open/read the jar twice as we do at the moment. The pattern is always this one: we check if multi-release so we read the manifest and then we open the fs to actually read the entry. Granted it only happens once but still it looks like something we could avoid. FWIW, Eclipse MAT was complaining about this particular call using quite a lot of memory.

I created a PR with a simple improvement than would avoid reading the manifest when we read from META-INF: https://github.com/quarkusio/quarkus/pull/41397 .

But for most jars, at some point, we will read some classes so I wonder if it would be worth trying to optimize this. Now it might just be an artifact of Eclipse MAT. I'm trying to open issues when I stumble upon them, otherwise it's hard to have the details later.

quarkus-bot[bot] commented 3 weeks ago

/cc @Sanne (core), @radcortez (core), @stuartwdouglas (core)