Nopol in RepairThemAll experiment fail to repair several bugs which can be repaired by nopol before

DehengYang commented 5 years ago

I compared the results of Nopol in two experiments: 1) https://github.com/Spirals-Team/defects4j-repair/tree/master/results/2017-march 2) https://github.com/program-repair/RepairThemAll_experiment/tree/master/results/Defects4J

The first experiment used Nopol-2017 version built under jdk 1.7, while the second experiment used Nopol-2018 version built under jdk 1.8. It is weird that the latter Nopol version cannot repair several bugs that can be repaired in the former Nopol version, e.g., Time 16, 18, 19, Mockito 29, 38.

So I would like to ask that why this different performance occurs and which nopol version is better/more powerful ?

Thanks, Dale

tdurieux commented 5 years ago

Hi @DehengYang Sorry for the late answer.

So I would like to ask that why this different performance occurs?

There is no easy answer to this question and the reason that there is a difference is multiple.

Nopol (and all APR tools) is extremely dependent on the environment, any change can impact the generated patch or even generate a patch or not, for example using jdk 1.7 or jdk 1.8, using oracle or openjdk, the language of the machine, the operating system, ...
We also used a different seed between the experiment from 2017 and the one in 2019 which can impact which patch are generated
we change the way to compute the classpath for the bug. in 2019, we mostly use the classpath provided by defects4j (that we clean because it is sometimes incorrect). In 2017, we used all the jar file provided by defects4j for a specific project (for example common-math). This also has an impact.
The implementation of Nopol changes in the 2 years

which nopol version is better/more powerful?

I honestly don't know, the 2017 version will work better on some bugs, the 2019 version will work better on different bugs. It is really difficult to know.

Sorry for my vague answer, I would love to have a clear and precise reason but I think there is none.

DehengYang commented 5 years ago

Dear @tdurieux ,

Thank you so much your detailed explanation!

The various possible reasons mentioned above really help me to gain a deeper understanding of Nopol (e.g., the seed, and the classpaths). For one of the mentioned factor named the language of the machine, I was once faced with the problem caused by the non-English system language. That is, Defects4J benchmark must be configured and ran in English environment, otherwise there will be extra failed tests for some bugs.

Thank you again for your answer. And would you mind further answering one of my doubts? I would like to ask that: have you dealt with the problem that some Defects4J bugs may yield unexpected failed tests when built under JDK 1.8 version? Such doubt can also be seen in https://github.com/program-repair/RepairThemAll/issues/19

Thank you again for your great help.

Best, Dale

tdurieux commented 5 years ago

For one of the mentioned factor named the language of the machine, I was once faced with the problem caused by the non-English system language.

We are well aware of this, and the timezone of the machine also has an impact on the Time project. This was correctly configured for this experiment.

have you dealt with the problem that some Defects4J bugs may yield unexpected failed tests when built under JDK 1.8 version?

Unfortunately not, Nopol needs to run in >=jdk 1.8. And since Nopol is mostly a dynamic analyzer + synthesizer. The buggy application needs to be executed in the same JVM instance as Nopol this the bug needs to be executed in >=jdk 1.8. Technically, it should be possible to remove this 'limitation' from nopol but it requires to rewrite completely the tool. I, unfortunately, don't have the time or the students to do it.

I expect that the impact of this can be:

no test is not failing with jkd1.8: Nopol will crash and no patch will be generated
new tests are not failing with jkd1.8: Nopol will most likely not be able to generate a patch because it will require Nopol to fix "bugs" that have different root causes with a single change.

And the cases that are more tricky

new tests are not failing and the original buggy test-case does not fail with jkd1.8: in this case, it is possible the Nopol will generate a patch for a 'different' bug.
the tests are flaky with jkd1.8: flaky tests are the worse for APR since it allows to generate completely random patch at a completely random location.

I, unfortunately, don't have an estimation of the proportion of each case. The two last cases seem less likely but I don't have scientific evidence to show that.

There is currently no paper that tries to understand what is happening in defects4j with jkd1.8. This work will be really interesting to understand the difference of behavior between two jdk. I hope that someone will study this.

DehengYang commented 5 years ago

Thank you for your great help! This still remains a research question for exploring the different behaviour between two jdk versions. It is very meaningful and pertinent to fundamental mechanisms in the research field.

Maybe in the future I will try to figure it out by further study. Thank you again for your time and consideration.

program-repair / RepairThemAll_experiment

Nopol in RepairThemAll experiment fail to repair several bugs which can be repaired by nopol before #4