smallrye / jandex

Java Annotation Indexer
Apache License 2.0
398 stars 94 forks source link

Indexing takes over 100x longer on big file #322

Closed reinhapa closed 5 months ago

reinhapa commented 1 year ago

When indexing a big file that contains around 16000 generated classes, the indexing process takes around 100 more time.

Jandex 3.1.0 ~ 5 seconds Jandex 3.1.1 ~ 490 seconds

Ladicek commented 1 year ago

OK, that is weird. The only change in 3.1.1 that could realistically cause this is #308. Could it be that you have a lot of recursive type parameters (T extends Something<T>) that have the same name? That would lead to a lot of hash table collisions, which could explain the issue.

reinhapa commented 1 year ago

In our test case reverting the internHashCode did fix the performance loss when indexing our JAR file

reinhapa commented 1 year ago

OK, that is weird. The only change in 3.1.1 that could realistically cause this is #308. Could it be that you have a lot of recursive type parameters (T extends Something<T>) that have the same name? That would lead to a lot of hash table collisions, which could explain the issue.

Actually #308 has caused this issue. In our case we need to revert the version delivered in WildFly 29 back to 3.1.0 due to fact, that the deployment will take 15 instead of around 1 minute.

I wonder, that no benchmark test seem to have caught this problem unfortunately...

Ladicek commented 1 year ago

This seems to be a reasonable approach: https://github.com/Ladicek/jandex/commits/type-variable-reference-intern-hash-collisions Could you please build Jandex from that branch and test with it? Thank you!

Ladicek commented 1 year ago

Note that the branch actually changes the persistent index format, so it will require releasing Jandex 3.2. Also older versions of Jandex won't be able to read an index produced by Jandex built from that branch, so take care.

reinhapa commented 1 year ago

This seems to be a reasonable approach: https://github.com/Ladicek/jandex/commits/type-variable-reference-intern-hash-collisions Could you please build Jandex from that branch and test with it? Thank you!

@Ladicek I will run a check tomorrow and give you feedback

reinhapa commented 1 year ago

@Ladicek The proposed fix seems to have more ore less the same performance as the version 3.1.0 taking slightly longer:

Version 3.1.0 JUnit_3 1 0

Proposed fix JUnit_3 x-SNAPSHOT

Ladicek commented 1 year ago

Yeah, it does take a little more time -- the JAR from your repo requires cca 8 seconds to index with Jandex 3.1.0 on my machine, and roughly 10 seconds with the suggested patch. I would think that's acceptable, but I can try to reduce it further.

I was more interested in actual usage. If you replace Jandex in WildFly with the version from my branch, does that keep working for you? (I don't see why it shouldn't, all the existing Jandex tests pass, but real world feedback is always best.)

reinhapa commented 1 year ago

Hi @Ladicek I did some measurements for comparison as follows:

Jandex 3.1.0 without index file in jar (17s) Rolled back library version on WildFly 29.

2023-08-25 10:00:57,318 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 10:01:14,771 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-4) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.2 without index file in jar (12min 17s) Current state using WildFly 29 without any changes.

2023-08-21 10:29:27,960 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-21 10:41:44,325 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230821_093101/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.3-SNAPSHOT with index file in (22s) Proposed library version on WildFly 29 accessing the big file with an index generated using Jandex 3.0.5.

2023-08-25 09:33:29,978 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 09:33:34,381 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0002: Loading failed for the annotation index "/C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/CH.obj.Application.generated.jar/META-INF/jandex.idx" with the following exception: java.lang.IllegalStateException: Invalid tag: 14
2023-08-25 09:33:51,020 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.3-SNAPSHOT without index file in jar (20s) Proposed library version on WildFly 29.

2023-08-25 09:37:44,489 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 09:38:04,063 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-3) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.
Ladicek commented 1 year ago

OK, thanks. I was more interested in hearing about any possible issues when running the application, but I realized that there's most likely nothing in WildFly bothering about type variable references, so we should be good there.

The 3rd experiment is interesting, Jandex 3.1.3-SNAPSHOT should very much be able to read an index generated by 3.0.5. There must be a bug somewhere, likely similar to #320, which I need to investigate.

In any case, as I mentioned, this will need Jandex 3.2, because of the change of persistent index format.

Ladicek commented 1 year ago

OK, I fixed the case when Jandex 3.1.3-SNAPSHOT couldn't read an index generated by 3.0.5, that was some poor handling of versioning on my side. Thanks for highlighting that.

reinhapa commented 1 year ago

In any case, as I mentioned, this will need Jandex 3.2, because of the change of persistent index format.

@Ladicek thanks for your fast feedback. For providing an index within our big libraries can mitigate the issue and we would patch our WildFly instance if needed. I guess that as long as we keep generating the index using a Jandex version that aligns the one for WildFly we should be save. Having a fix in the future will be perfectly ok for us.

Ladicek commented 1 year ago

If indexing takes longer time, pregenerating the index (using the Maven plugin for example) is a good choice indeed.

reinhapa commented 5 months ago

@Ladicek does this mean, that you create a version 3.2.0 now?

Ladicek commented 5 months ago

There's one more PR that needs some more work (#361), so it will take a bit more time, but 3.2.0 is near indeed.