Closed ziyilin closed 6 months ago
Update:
I found many differences are related with java security. I tracked one of the unstable class org.xml.sax.helpers.DefaultHandler
and found the cause is the typeflow state is changing from builds.
Adding the following debug code:
diff --git a/substratevm/src/com.oracle.graal.pointsto/src/com/oracle/graal/pointsto/flow/TypeFlow.java b/substratevm/src/com.oracle.graal.pointsto/src/com/oracle/graal/pointsto/flow/TypeFlow.java
index 1e6061b47d1..06216fa7d19 100644
--- a/substratevm/src/com.oracle.graal.pointsto/src/com/oracle/graal/pointsto/flow/TypeFlow.java
+++ b/substratevm/src/com.oracle.graal.pointsto/src/com/oracle/graal/pointsto/flow/TypeFlow.java
@@ -466,6 +466,9 @@ public abstract class TypeFlow<T> {
/** Let the observers know that the state has changed. */
protected void notifyObservers(BigBang bb) {
+ if(this instanceof NullCheckTypeFlow && source != null && source.toString().equals("at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:290) [bci: 125]")){
+ System.out.println("states is " + state);
+ }
for (TypeFlow<?> observer : getObservers()) {
observer.onObservedUpdate(bb);
}
I can see the following printed out messages for different builds: Build1:
states is 1TypeMObject<[0x000002DA000002DB:!S:Provider$Service]>
states is MTypeMObject<2:[0x000002DA000002DB:!S:Provider$Service, 0x00000CA000000CA1:!S:SunEC$ProviderService]>
states is MTypeMObject<3:[0x000002DA000002DB:!S:Provider$Service, 0x00000CA000000CA1:!S:SunEC$ProviderService, 0x00000CD200000CD3:!S:SunPCSC$ProviderService]>
Build Time: 3421 classes Run Time: 17 classes
Build2:
states is MTypeMObject<6:[0x000002DA000002DB:!S:Provider$Service, 0x00000C9300000C94:!S:SunEC$ProviderService, 0x00000CF200000CF3:!S:JdkSASL$ProviderService, 0x00000CF300000CF4:!S:XMLDSigRI$ProviderService, 0x00000CF500000CF6:!S:SunProvider$ProviderService, 0x00000D6200000D63:!S:SunPKCS11$P11Service]>
Build Time: 3411 classes Run Time: 17 classes
Build3:
states is MTypeMObject<2:[0x000002DA000002DB:!S:Provider$Service, 0x00000C9300000C94:!S:SunEC$ProviderService]>
states is MTypeMObject<5:[0x000002DA000002DB:!S:Provider$Service, 0x00000C9300000C94:!S:SunEC$ProviderService, 0x00000CF200000CF3:!S:JdkSASL$ProviderService, 0x00000CF300000CF4:!S:XMLDSigRI$ProviderService, 0x00000CF500000CF6:!S:SunProvider$ProviderService]>
states is MTypeMObject<6:[0x000002DA000002DB:!S:Provider$Service, 0x00000C9300000C94:!S:SunEC$ProviderService, 0x00000CF200000CF3:!S:JdkSASL$ProviderService, 0x00000CF300000CF4:!S:XMLDSigRI$ProviderService, 0x00000CF500000CF6:!S:SunProvider$ProviderService, 0x00000D6200000D63:!S:SunPKCS11$P11Service]>
Build Time: 4912 classes Run Time: 190 classes
But still don't know why TypeFlow.state
is variant.
I have found the reason. It's caused by PlatformClassLoader
introduced since JDK9. When the demo is built with GraalVM compiled with JDK8, the initialized class numbers are stable.
BUILD_TIME RUN_TIME
2644 16
2644 16
2644 16
2644 16
2644 16
2644 16
2644 16
2644 16
2644 16
2644 16
The following discussion is based on GraalVM compiled with JDK11. The version information is
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment GraalVM 21.1.0-dev (build 11.0.9+10-jvmci-21.0-b03)
OpenJDK 64-Bit Server VM GraalVM 21.1.0-dev (build 11.0.9+10-jvmci-21.0-b03, mixed mode, sharing)
Native-image reflectively calls sun.security.ssl.TrustStoreManager.getTrustedCerts
in com.oracle.svm.core.jdk.TrustStoreManagerFeature.afterRegistration
. It comes to load the "SunEC" provider with the followings stack trace:
load:338, ProviderConfig$ProviderLoader (sun.security.jca)
run:248, ProviderConfig$3 (sun.security.jca)
run:242, ProviderConfig$3 (sun.security.jca)
doPrivileged:-1, AccessController (java.security)
doLoadProvider:242, ProviderConfig (sun.security.jca)
getProvider:222, ProviderConfig (sun.security.jca)
getProvider:266, ProviderList (sun.security.jca)
getService:379, ProviderList (sun.security.jca)
getInstance:157, GetInstance (sun.security.jca)
getImpl:700, Security (java.security)
getInstance:158, AlgorithmParameters (java.security)
decodeParams:126, AlgorithmId (sun.security.x509)
<init>:119, AlgorithmId (sun.security.x509)
parse:393, AlgorithmId (sun.security.x509)
parse:168, X509Key (sun.security.x509)
<init>:75, CertificateX509Key (sun.security.x509)
parse:674, X509CertInfo (sun.security.x509)
<init>:169, X509CertInfo (sun.security.x509)
parse:1842, X509CertImpl (sun.security.x509)
<init>:194, X509CertImpl (sun.security.x509)
engineGenerateCertificate:105, X509Factory (sun.security.provider)
generateCertificate:355, CertificateFactory (java.security.cert)
engineLoad:764, JavaKeyStore (sun.security.provider)
engineLoad:243, KeyStoreDelegator (sun.security.util)
load:1479, KeyStore (java.security)
loadKeyStore:365, TrustStoreManager$TrustAnchorManager (sun.security.ssl)
getTrustedCerts:313, TrustStoreManager$TrustAnchorManager (sun.security.ssl)
getTrustedCerts:55, TrustStoreManager (sun.security.ssl)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
afterRegistration:65, TrustStoreManagerFeature (com.oracle.svm.core.jdk)
The Provider.load
method looks like:
public Provider load(String pn) {
if (debug != null) {
debug.println("Attempt to load " + pn + " using SL");
}
Iterator<Provider> iter = services.iterator();
while (iter.hasNext()) {
try {
Provider p = iter.next();
String pName = p.getName();
if (debug != null) {
debug.println("Found SL Provider named " + pName);
}
if (pName.equals(pn)) {
return p;
}
}
...
pn
here is "SunEC", so the iterator continues to get the next provider until "SunEC". In other word, the iteration stops whenever "SunEC" is found. iter.hasNext()
not only checks the next item in iterator, but also initializes and loads the corresponding Provider
class. See the source code of actual hasNext
method:
public boolean hasNext() {
while (nextProvider == null && nextError == null) {
// get next provider to load
while (!iterator.hasNext()) {
if (currentLoader == null) {
return false;
} else {
currentLoader = currentLoader.getParent();
iterator = iteratorFor(currentLoader);
}
}
// attempt to load provider
ServiceProvider provider = iterator.next();
try {
@SuppressWarnings("unchecked")
Provider<T> next = (Provider<T>) loadProvider(provider);
nextProvider = next;
} catch (ServiceConfigurationError e) {
nextError = e;
}
}
return true;
}
It is called with the following stack trace since the top frame of the previous stack trace:
hasNext:1076, ServiceLoader$ModuleServicesLookupIterator (java.util)
hasNext:1299, ServiceLoader$2 (java.util)
hasNext:1384, ServiceLoader$3 (java.util)
load:338, ProviderConfig$ProviderLoader (sun.security.jca)
The iterator
is an instance of java.util.concurrent.CopyOnWriteArrayList.COWIterator
with contents of a list of 6 java security providers:
sun.security.jgss.SunProvider
org.jcp.xml.dsig.internal.dom.XMLDSigRI
com.sun.security.sasl.gsskerb.JdkSASL
sun.security.pkcs11.SunPKCS11
sun.security.smartcardio.SunPCSC
sun.security.ec.SunEC
This list is set at PlatformClassloader
initialization time and stored in ClassLoader.classLoaderValueMap
field. But the list sequence is variant. I didn't find the initialization code and don't know the exact reason. But the variant of the sequence leading to the variant of reached security classes. When sun.security.ec.SunEC
is at the head of this list, no other providers will be reached. But when it is at the tail of the list, all the other 5 providers are reached. The actually affected classes are not only these 5 Providers
. Many other classes are reached by initializing and loading each provider.
Get back to Substrate VM's method com.oracle.svm.core.jdk.TrustStoreManagerFeature.afterRegistration
.
https://github.com/oracle/graal/blob/cf4cf49c7858b367cadfe08b534eccfd08d6b197/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jdk/Target_sun_security_ssl_TrustStoreManager.java#L80
Although XMLDSigRI
is explicitly initialize at build time, the class may not actually be reached and got initialized.
Issue tracked internally on GR-28835
I'm currently investigating this. I noticed the same problem when I run benchmarks that are using security providers. For now, if we want to use JDK11 and have stable results, we can add native-image flag -H:+EnableAllSecurityServices. This is certainly not a permanent solution, but it works.
@ziyilin I ran your demo with this and got stable results for every run, 5388 for BUILD_TIME and 196 for RUN_TIME.
Thanks, good to know that. I think -H:+EnableAllSecurityServices
option initializes all security providers so it becomes stable, and the initialized classes number also grows.
Yes, that's the downside of using this flag, but at least we have stable numbers if we are testing something and need to compare between runs.
This should be fixed
Describe the issue Class initialization results printed during native image generation are unstable. Not sure if it's initialization issue or reporting issue.
Steps to reproduce the issue Please include both build steps as well as run steps
GRAALVM_HOME
is set to a graalvm compiled with JDK11 and theJAVA_HOME
is JDK11.The script statically analyzes the demo with reporting the class initialization 10 times. The result is printed out like:
This result shows the
BUILD_TIME
andRUN_TIME
initialization classes are not stable.Describe GraalVM and your environment: