jpmml / jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
GNU Affero General Public License v3.0
28 stars 2 forks source link

Got repeated datainputStream directory name. #27

Open jye0829 opened 2 weeks ago

jye0829 commented 2 weeks ago

I got two different regressor model, using pmml-transpiler transfer to .jar format. And import in the project as two jar file. The error message is 2 files found with path 'com/zhenzhe/javacv/PMML$1266265220.data'. Adding a packagingOptions block may help, please refer to https://developer.android.com/reference/tools/gradle-api/8.2/com/android/build/api/dsl/ResourcesPackagingOptions for more information. It is the transpiler generate two same data directory.Do you have any idea why they are the same

Screenshot 2024-08-22 at 3 22 57 PM
vruusmann commented 2 weeks ago

2 files found with path 'com/zhenzhe/javacv/PMML$1266265220.data'.

The identifiers are generated using the IdentifierUtil#create(String, PMMLObject) utility method: https://github.com/jpmml/jpmml-transpiler/blob/1.3.5/pmml-transpiler/src/main/java/org/jpmml/translator/IdentifierUtil.java#L112-L115

You get two identical identifiers, if the second argument - a PMMLObject object - is yielding the same "system identity hashcode" (SIH).

IIRC, the SIH relates to the location of the Java object in JVM memory space, and it is set automatically by the JVM. I cannot provide any "location hints" in advance, or move the Java object to a different location afterwards.

vruusmann commented 2 weeks ago

What's your JVM vendor/version?

Again, IIRC, SIHs are often semi-deterministic, meaning that if you play out the same sequence of events (on a newly created JVM), you are likely to get the same SIHs for your Java objects.

My guess is that you were generating these JAR files using JPMML-Transpiler command-line application in two consecutive but independent sessions - hence the JVM is placing the org.dmg.pmml.PMMLObject instance to the same memory address, which yields the same SIH.

Try to change something about your JVM configuration between these two sessions. For example, change the size of the JVM memory, so that Java objects would get to deposited to a different memory location.

vruusmann commented 2 weeks ago

All things considered, there's not enough information for me here, in order to suggest a definite workaround/fix.

I don't see this kind of resource identifier collision as a major bug. In fact, I'd consider it more like a feature, meaning that it's possible to get reproducible JAR builds!

vruusmann commented 2 weeks ago

The alternative to SIHs would be random number generation. Feel free to implement this code change locally.

Perhaps there could be a configuration option for choosing the identifier style.