xlab-uiuc / IDoCT

Illinois Dataset of Configuration Tests
3 stars 23 forks source link

Research Evaluation Task PR #6

Closed monilnarang1 closed 2 years ago

monilnarang1 commented 2 years ago

Not all of the five below tests are randomly selected, but I have taken care to include the following:-

Test 1. org.apache.hadoop.security.TestGroupsCaching#testGroupsCaching

configParams = [   "hadoop.security.groups.cache.background.reload.threads",   "hadoop.security.groups.cache.secs",   "hadoop.user.group.static.mapping.overrides",   "hadoop.security.groups.cache.background.reload",   "hadoop.security.groups.cache.warn.after.ms",   "hadoop.security.groups.shell.command.timeout" ]

Parameter 1 :- hadoop.security.groups.cache.background.reload.threads  hadoop.security.groups.cache.background.reload.threads=3 (default) PASS hadoop.security.groups.cache.background.reload.threads=0 PASS  hadoop.security.groups.cache.background.reload.threads=-1 PASS hadoop.security.groups.cache.background.reload.threads=100  PASS hadoop.security.groups.cache.background.reload.threads=10000000000  FAIL java.lang.NumberFormatException hadoop.security.groups.cache.background.reload.threads=-10000000000 FAIL java.lang.NumberFormatException

*Ctest passing for 0 and -1 values as this configuration isn’t considered until hadoop.security.groups.cache.background.reload (default value is false) is set to true. Both 0 and -1 fails with java.lang.IllegalArgumentException when the following is passed  ->  hadoop.security.groups.cache.background.reload=true  hadoop.security.groups.cache.background.reload.threads=-1 FAIL ->  hadoop.security.groups.cache.background.reload=true  hadoop.security.groups.cache.background.reload.threads=0  FAIL

Parameter 2 :- hadoop.security.groups.cache.secs    hadoop.security.groups.cache.secs=300 (Default)  PASS  hadoop.security.groups.cache.secs=100 PASS   hadoop.security.groups.cache.secs=10000000000 PASS   hadoop.security.groups.cache.secs=0 FAIL java.lang.IllegalArgumentException   hadoop.security.groups.cache.secs=-1 FAIL java.lang.IllegalArgumentException    hadoop.security.groups.cache.secs=100000000000000000000000000000000 FAIL java.lang.NumberFormatException

Parameter 3 :- hadoop.user.group.static.mapping.overrides hadoop.user.group.static.mapping.overrides=dr.who=; (Default)  PASS   hadoop.security.groups.cache.secs=user2 PASS    hadoop.security.groups.cache.secs=user3 PASS    hadoop.security.groups.cache.secs= PASS   hadoop.security.groups.cache.secs=user1 FAIL java.lang.AssertionError   hadoop.security.groups.cache.secs=me FAIL java.lang.AssertionError

*Ctest fails for “user1” and “me” as the test explicitly sets and verifies groups for these users in the cache. And by changing this configuration we are altering (overriding) group mappings for these users. The test fails but we are able to correctly verify the effectiveness of the configuration. 

Parameter 4 :- hadoop.security.groups.cache.background.reload  hadoop.security.groups.cache.background.reload=false (Default) PASS  hadoop.security.groups.cache.background.reload=true PASS

Parameter 5 :- hadoop.security.groups.cache.warn.after.ms hadoop.security.groups.cache.warn.after.ms=5000 (Default)  PASS   hadoop.security.groups.cache.warn.after.ms=0 PASS hadoop.security.groups.cache.warn.after.ms=-1 PASS   hadoop.security.groups.cache.warn.after.ms=100 PASS   hadoop.security.groups.cache.warn.after.ms=-10000000000000000000 FAIL java.lang.NumberFormatException  *Ctests passed for values 0 and -1 as param’s value is only used in comparing against delta time -> (deltaMs > warningDeltaMs)

Parameter 6 :- hadoop.security.groups.shell.command.timeout    hadoop.security.groups.shell.command.timeout=0s (Default)  PASS  hadoop.security.groups.shell.command.timeout=5s PASS   hadoop.security.groups.shell.command.timeout=2m PASS   hadoop.security.groups.shell.command.timeout=2j FAIL java.lang.NumberFormatException   hadoop.security.groups.shell.command.timeout=30x FAIL java.lang.NumberFormatException  

Test 2. org.apache.hadoop.security.TestSecurityUtil#testGetHostFromPrincipal

configParams = [   "hadoop.security.dns.log-slow-lookups.threshold.ms",   "hadoop.security.dns.log-slow-lookups.enabled" ],

Parameter 1 :- hadoop.security.dns.log-slow-lookups.threshold.ms   hadoop.security.dns.log-slow-lookups.threshold.ms=1000 (Default)  PASS   hadoop.security.dns.log-slow-lookups.threshold.ms=0 PASS   hadoop.security.dns.log-slow-lookups.threshold.ms=-1 PASS   hadoop.security.dns.log-slow-lookups.threshold.ms=-9999999999 FAIL java.lang.ExceptionInInitializerError caused by java.lang.NumberFormatException *Ctests passed for 0 and -1 as the config value is only used to compare against time elapsed in ms -> (elapsedMs >= slowLookupThresholdMs)

Parameter 2 :- hadoop.security.dns.log-slow-lookups.enabled  hadoop.security.dns.log-slow-lookups.enabled=false (Default)  PASS hadoop.security.dns.log-slow-lookups.enabled=true PASS

Test 3. org.apache.hadoop.security.ssl.TestSSLFactory#testServerDifferentPasswordAndKeyPassword

configParams = [   "hadoop.security.credential.provider.path",   "hadoop.ssl.enabled.protocols",   "hadoop.ssl.hostname.verifier",   "hadoop.ssl.keystores.factory.class",   "hadoop.ssl.server.conf",   "hadoop.security.credential.clear-text-fallback" ],

Parameter 1:-   hadoop.security.credential.provider.path hadoop.security.credential.provider.path= (Default)  PASS hadoop.security.credential.provider.path=jceks://file/$hadoop_dir/hadoop-common-project/hadoop-common/target/test/data/test.jks  PASS hadoop.security.credential.provider.path=jceks://file/$hadoop_dir/hadoop-common-project/hadoop-common/target/test/data/keystore.jks  PASS hadoop.security.credential.provider.path=randomPath PASS hadoop.security.credential.provider.path=./ **PASS** all provided paths passed the Ctest as this configuration is read from ssl-server.xml instead of core-ctest.xml file (while creating new SSLFactory object in test, the method readSSLConfiguration of class SSLFactory.java changes the configuration file to read from). The code reads the name of ssl config file from param hadoop.ssl.server.conf whose default value is ssl-server.xml.

Solutions :-  i. Adding line conf.set(SSLFactory.SSL_SERVER_CONF_KEY, "core-ctest.xml"); in test after it has been changed to change it back. ii. Passing param hadoop.ssl.server.conf=core-ctest.xml along with hadoop.security.credential.provider.path which would also make sure to read from our .xml file.  iii. Adding line CredentialProviderFactory.getProviders(conf); in the test to fetch and evaluate the provider path again from the configuration pointing to default config file i.e. core-ctest.xml in our case.

=> Both solution i and solution ii would have worked but in the test’s code while settings the configuration parameters, specific to ssl factory, the configuration file name (stored in variable sslConfFileName) is hard-coded to ssl-server.xml. Hence, the ssl specific params would be missing from our changed configuration file. (Although I had to fix this as well in order to run the test correctness of ctest with parameter hadoop.ssl.server.conf, more on this below) => Solution iii was selected as it only adds new code without changing the existing code.

After adding the patch of solution iii, following are the ctest results:-    hadoop.security.credential.provider.path= (Default)  PASS hadoop.security.credential.provider.path=jceks://file/$hadoop_dir/hadoop-common-project/hadoop-common/target/test/data/test.jks  PASS hadoop.security.credential.provider.path=jceks://file/$hadoop_dir/hadoop-common-project/hadoop-common/target/test/data/keystore.jks  PASS hadoop.security.credential.provider.path=randomPath FAIL java.io.IOException: No CredentialProviderFactory for randomPath in hadoop.security.credential.provider.path hadoop.security.credential.provider.path=./ FAIL java.io.IOException: No CredentialProviderFactory for ./ in hadoop.security.credential.provider.path

Parameter 2:-   hadoop.ssl.enabled.protocols hadoop.ssl.enabled.protocols=TLSv1.2 (Default)   PASS hadoop.ssl.enabled.protocols=TLSv1.3 PASS hadoop.ssl.enabled.protocols=TLSv99.99 PASS hadoop.ssl.enabled.protocols=FakeVersion PASS *Any version or any random value for this config passes the Ctests because in the sslFactory.init() call (inside method checkSSLFactoryInitWithPasswords in TestSSLFactory.java) only assigns the protocols array but doesn’t verify it.

I have committed change that adds a call to the function setEnabledProtocols (inside sslFactory.createSSLEngine() call in method checkSSLFactoryInitWithPasswords of class TestSSLFactory.java) which first verifies the protocols and then assign their values in the object. Results after the patch:-  hadoop.ssl.enabled.protocols=TLSv1.2 (Default)   PASS hadoop.ssl.enabled.protocols=TLSv1.3 PASS hadoop.ssl.enabled.protocols=TLSv1.1 PASS hadoop.ssl.enabled.protocols=TLSv99.99 FAIL java.lang.IllegalArgumentException: Unsupported protocolTLSv99.99 hadoop.ssl.enabled.protocols=FakeVersion FAIL java.lang.IllegalArgumentException: Unsupported protocolFakeVersion  

Parameter 3:-  hadoop.ssl.hostname.verifier hadoop.ssl.hostname.verifier= DEFAULT (Default) PASS  hadoop.ssl.hostname.verifier=STRICT PASS hadoop.ssl.hostname.verifier=ALLOW_ALL PASS hadoop.ssl.hostname.verifier=INVALID_VALUE FAIL java.security.GeneralSecurityException: Invalid hostname verifier hadoop.ssl.hostname.verifier=randomString FAIL java.security.GeneralSecurityException: Invalid hostname verifier

Parameter 4:- hadoop.ssl.keystores.factory.class hadoop.ssl.keystores.factory.class=org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory (Default)    **PASS* hadoop.ssl.keystores.factory.class=org.apache.hadoop.security.ssl.DelegatingSSLSocketFactory FAIL java.lang.RuntimeException: class org.apache.hadoop.security.ssl.DelegatingSSLSocketFactory not org.apache.hadoop.security.ssl.KeyStoresFactory hadoop.ssl.keystores.factory.class=org.apache.hadoop.security.ssl.KeyStoresFactory FAIL java.lang.NoSuchMethodException: org.apache.hadoop.security.ssl.KeyStoresFactory.() hadoop.ssl.keystores.factory.class=randomString FAIL* java.lang.ClassNotFoundException: Class randomString not found Currently FileBasedKeyStoresFactory is the only KeyStoresFactory implementation -> source - https://hadoop.apache.org/docs/r2.4.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html

Parameter 5:- hadoop.ssl.server.conf hadoop.ssl.server.conf=ssl-server.xml (Default) PASS hadoop.ssl.server.conf=ssl-client.xml FAIL java.security.GeneralSecurityException: The property 'ssl.server.keystore.location' has not been set in the ssl configuration file. hadoop.ssl.server.conf=core-site.xml FAIL java.security.GeneralSecurityException: The property 'ssl.server.keystore.location' has not been set in the ssl configuration file. hadoop.ssl.server.conf=core-ctest.xml FAIL java.security.GeneralSecurityException: The property 'ssl.server.keystore.location' has not been set in the ssl configuration file. hadoop.ssl.server.conf=random-file.xml  FAIL java.security.GeneralSecurityException: The property 'ssl.server.keystore.location' has not been set in the ssl configuration file. *Ctest failed for all other xml files except for the default as it is hard coded in the test’s code to set the below ssl specific configurations in the ssl-server.xml file. These configurations are not included in the param_unset_gettermap as they are being set (in method createSSLConfig of class KeyStoreTestUtil.java). {  ssl.server.keystore.location, ssl.server.keystore.type, ssl.server.keystore.password, ssl.server.keystore.keypassword }_

Changing the hard-coded sslConfFileName to read from configuration produced the below desirable results hadoop.ssl.server.conf=ssl-server.xml (Default) PASS hadoop.ssl.server.conf=core-ctest.xml PASS hadoop.ssl.server.conf=core-site.xml PASS hadoop.ssl.server.conf=random-file.xml **PASS** All the ctests passed because the test creates a new file everytime -> KeyStoreTestUtil.saveConfig(new File(sslConfsDir, sslConfFileName), sslConf); *All the Ctests passed because the test first creates a new file every time KeyStoreTestUtil.saveConfig(new File(sslConfsDir, sslConfFileName), sslConf); and then sets all the needed configuration values in it.

Parameter 6:- hadoop.security.credential.clear-text-fallback hadoop.security.credential.clear-text-fallback=true (Default) PASS hadoop.security.credential.clear-text-fallback=false PASS

Test 4. org.apache.hadoop.io.serializer.avro.TestAvroSerialization#testReflect

configParams = [   "io.serializations" ]

Parameter 1:- io.serializations io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization, org.apache.hadoop.io.serializer.avro.AvroReflectSerialization (Default) PASS  io.serializations=org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization, org.apache.hadoop.io.serializer.avro.AvroReflectSerialization PASS io.serializations=org.apache.hadoop.io.serializer.avro.AvroReflectSerialization PASS io.serializations=org.apache.hadoop.io.serializer.WritableSerialization FAIL java.lang.NullPointerException -> as io.serializer is null io.serializations=someGarbageValue FAIL java.lang.NullPointerException -> as io.serializer is null *The test requires AvroReflectSerialization to run

Test 5. org.apache.hadoop.crypto.TestCryptoStreamsWithJceAesCtrCryptoCodec#testHasCapability

configParams = [  "hadoop.security.crypto.jce.provider",   "hadoop.security.java.secure.random.algorithm",   "hadoop.security.crypto.cipher.suite" ]

Parameter 1:- hadoop.security.crypto.jce.provider hadoop.security.crypto.jce.provider= (Default) PASS hadoop.security.crypto.jce.provider=SunJCE **PASS* hadoop.security.crypto.jce.provider=SunRsaSign FAIL  java.security.NoSuchAlgorithmException: No such algorithm: AES/CTR/NoPadding hadoop.security.crypto.jce.provider=randomProvider FAIL* java.security.NoSuchProviderException: No such provider: randomProvider I was only able to find “SunJCE” provider which supports AES/CTR/NoPadding encryption algorithm. And only that encryption algorithm can be used as it is hardcoded in the convert method in class CipherSuite.java.

Parameter 2:- hadoop.security.java.secure.random.algorithm hadoop.security.java.secure.random.algorithm=SHA1PRNG (Default) PASS hadoop.security.java.secure.random.algorithm=MD5PRNG PASS hadoop.security.java.secure.random.algorithm=random PASS hadoop.security.java.secure.random.algorithm=garbageValue PASS *Ctests for input “random” and “garbageValue” passed as, in setConf method of class JceAesCtrCryptoCodec.java, when GeneralSecurityException is caught it is logged with warning without propagation and instead makes a called to a default secure random number generator. Hence, even if a random value is provided the Ctests pass as the code has contingency logic. (New test created below, to correctly validate the effectiveness of this param)

Parameter 3:- hadoop.security.crypto.cipher.suite hadoop.security.crypto.cipher.suite=AES/CTR/NoPadding (Default) PASS hadoop.security.crypto.cipher.suite=AES/CTR/Padding FAIL Invalid cipher suite name: AES/CTR/Padding hadoop.security.crypto.cipher.suite=randomSuite FAIL Invalid cipher suite name: randomSuite hadoop.security.crypto.cipher.suite=AES/OFB32/PKCS5Padding FAIL  java.lang.IllegalArgumentException: Invalid cipher suite name: AES/OFB32/PKCS5Padding hadoop.security.crypto.cipher.suite=Unknown FAIL java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "org.apache.hadoop.crypto.TestCryptoStreamsWithJceAesCtrCryptoCodec.codec" is null *Only “AES/CTR/NoPadding” and “Unknown” are recognised as valid cipher suite names - hard coded in class CipherSuite.java

For all the above three parameters there exist some hard coding in the actual code. And hence 1st and 2nd both approaches fail because neither can we test directly (Ctest either passes for bad values or fails for good values) nor would rewriting the test would solve the issue as the hard-coded part lie in the actual code which our test invokes. 



I have implemented the below test case from scratch to correctly validate the correctness and effectiveness of the second param of this test :- "hadoop.security.java.secure.random.algorithm"

New created Test:- org.apache.hadoop.crypto.TestCryptoStreamsWithJceAesCtrCryptoCodec#testSecureRandomAlgorithmConfiguration

configParams = [   "hadoop.security.java.secure.random.algorithm" ],

Parameter 1:- hadoop.security.java.secure.random.algorithm hadoop.security.java.secure.random.algorithm=SHA1PRNG (Default) PASS hadoop.security.java.secure.random.algorithm= PASS hadoop.security.java.secure.random.algorithm=NativePRNG PASS  hadoop.security.java.secure.random.algorithm=garbageValue FAIL java.security.NoSuchAlgorithmException: garbageValue SecureRandom not available hadoop.security.java.secure.random.algorithm=random123 FAIL java.security.NoSuchAlgorithmException: random123 SecureRandom not available

From the above responses, we can conclude that the test is able to verify the correctness and effectiveness of the config param, but it would be testing some updated logic (because we will have to change the hard-coded portion).

Considering the cases in which the written Ctest fails (eg. hadoop.security.java.secure.random.algorithm=garbageValue), from the code’s point of view it should have passed because the existing logic permits this value of configuration (by not propagating the exception and instead substituting default random value generator). As this test will be testing some new logic which is not present in the actual code and hence, it doesn’t feel right from the view point of regression testing.

Please correct me if I missed something and I’d love to discuss how we can improve the Ctestings of these kinds of parameters where there exists some hard-coding in the actual code itself.


I have also included the final results, after making code changes, in the csv file as well.

shuaiwang516 commented 2 years ago

Thank you for this very detailed PR. This is a great work!!

Test 1. org.apache.hadoop.security.TestGroupsCaching#testGroupsCaching

For configuration parameter hadoop.security.groups.cache.background.reload.threads, it is great that you can realize the configuration dependency that this param only gets considered when hadoop.security.groups.cache.background.reload is set to true.

All your other parameter analysis are good.

One thing we can also think about for this test is can we test hadoop.security.groups.negative-cache.secs parameter? In the first line of the test body, hadoop.security.groups.negative-cache.secs is set to 0, so any other value (set by developer or user for testing purpose) will be replaced by 0. How could we modify the set (and also the test orcale/assertions) to enable this test to check negative-cache.secs param?

Test 3. org.apache.hadoop.security.ssl.TestSSLFactory#testServerDifferentPasswordAndKeyPassword

all provided paths passed the Ctest as this configuration is read from ssl-server.xml instead of core-ctest.xml

This is a great catch!! Your solution is also reasonable! Great work here! This makes me think of one potential research question, how do users know whether they put their configuration value in the correct file? Or let's say WHY do we need so many configuration files instead of the unique one? It is already very hard for users to configure the systems with so many configurations, and it would be even harder if different configurations should be put into different file.

This PR in total looks good to me and I will merge it now. One thing I wanna ask is do you know how to check which configuration parameter is get/set by which test? If not please check this in ctest paper, ctest repo, and actual implementation for Hadoop.

monilnarang1 commented 2 years ago

Thank you @shuaiwang516 for taking the time out & doing such an in-depth review. Thank you for all the compliments; I feel uplifted. I'm sorry I couldn't reply earlier.

I will start investigating and try to find solutions to test hadoop.security.groups.negative-cache.secs parameter.

This is what I have seen, there is a common config file which would have everything, and if there are needs for separate values of configurations, say for different environments (local, staging, prod, prod-us, etc.) or different use cases (testing or Ctest in our case) then we'll have separate files with precedence as per some environment variable. Those individual files would only have the configuration values different from the common config. The values from individual files are given priority and we would pick all other values from the common file. Now that you have mentioned it, I am curious why they decided to keep ssl-server specific configurations in a separate file. By all means, it could have been with the other configurations; they could have segregated these, if that was the reason, by adding some good comments. I will try to find out the reasons for this.

Yes, I am aware by scraping the added logs that we print when a configuration is fetched or updated, we can find out which configuration parameter is GET/SET for a particular test. I even used these log lines as break points while running the tests in debug mode to find precisely when & where any parameter is being fetched or updated. Thank you for sharing those resources I will go through them for a deeper understanding.