xlab-uiuc / IDoCT

Illinois Dataset of Configuration Tests
3 stars 23 forks source link

Running the CTests to investigate the number of unexpected scenarios BAD|PASS and GOOD|FAIL #10

Closed Shubhi-Jain98 closed 1 year ago

Shubhi-Jain98 commented 2 years ago

While testing the code, there are certain situations when we encounter unexpected results. The false negatives are a major point of concern as they might cause issues in production. On the other hand, false positives might upset and trouble engineers in finding a bug that might not be present.
The purpose of this investigation is to minimize unexpected outcomes. The task is to find: “An approach that may help developers know the real false negatives, BAD|PASS and false positives, GOOD|FAIL

I did the investigation on 43 tests of the Hadoop-common project with an average number of configuration parameters as 4.21.

False positives


Following are the tests that yielded the false positives, GOOD|FAIL

  1. net.topology.script.file.name
Tests:
    1. org.apache.hadoop.net.TestSwitchMapping#testCachingRelaysStringOperationsToNullScript
    2. org.apache.hadoop.net.TestScriptBasedMappingWithDependency#testNoFilenameMeansSingleSwitch
  2. hadoop.security.auth_to_local
Tests:
    1. org.apache.hadoop.security.TestRaceWhenRelogin#test

False negatives:

Below are the unique configuration parameters that yielded the false negatives, BAD|PASS. There are multiple scenarios that cause the test to pass on BAD values.

Code-side Handling

There are many situations when the test passes on BAD values, which can be due to certain scenarios as mentioned below:

  1. file.stream-buffer-size Inside code, developers picked the maximum of two values. So sending negative or 0 won’t do any harm.

  2. dfs.ha.fencing.ssh.private-key-files The given test requires the path of the file which contains the ssh port to connect to. The given test may not pass if the connection could not be established, which may be due to "server down" or if the connection is made outside the company environment (e.g. without a VPN connection). Therefore, in this case, the test passes with a warning.
  3. fs.client.htrace.sampler.classes
  4. hadoop.htrace.span.receiver.classes
  5. hadoop.security.authentication Before assertEquals in code, it dynamically sets the value to required one. So value set by configuration wont have any effect here

  6. hadoop.security.crypto.jce.provider CTests for input "SunRsaSign" and "randomProvider" passes as in setConf function of JceAesCtrCryptoCodec.java, GeneralSecurityException exception logs warning and creates a default secure random number generator. Hence code passes.
  7. hadoop.security.groups.cache.background.reload.threads The parameter represents the #threads for cache refresh requests and is only relevant if hadoop.security.groups.cache.background.reload is true. Therefore, CTest passes for bad values 0 and –1 when hadoop.security.groups.cache.background.reload is false.
  8. hadoop.security.groups.cache.warn.after.ms The parameter value is being as comparison described in the code snippet below, hence it passes
. if (deltaMs > warningDeltaMs) {
….
}
  9. hadoop.security.java.secure.random.algorithm Ctests for input “random” and “garbageValue” passes as in setConf function of JceAesCtrCryptoCodec.java, GeneralSecurityException exception logs warning and creates a default secure random number generator. Hence code passes.
  10. io.compression.codec.bzip2.library
  11. net.topology.script.number.args Value is unused in code. Hence, any value yields the same result.

Not General Fixes:

The below do not follow a strict check. Example, buffer-size says that "The size of this buffer should probably be a multiple of hardware page size". So, we can simply log a warning and let the test case pass for any positive value. I found that there are multiple places where we need to add checks to handle the below cases.

  1. io.file.buffer.size
  2. file.stream-buffer-size
  3. io.bytes.per.checksum

General Fixes:

As per the Hadoop documentation, for the below two configuration parameters

  1. file.bytes-per-checksum:
  2. hadoop.kerberos.min.seconds.before.relogin: This param demoted time and hence shouldn't be negative. I have fixed and pushed the changes to forked Hadoop repo, such that if the checksum value is greater than file.stream.buffer.size then code will throw an exception, and similarly when time is negative. 

xylian86 commented 1 year ago

@Shubhi-Jain98 The SHA in your PR is wrong. Please check. I will close it now. Feel free to open another one after you fixing the issue.