Closed minkull closed 4 years ago
The authors apply for the Reusable, Available, Replicated and Reproduced Badges.
Available: The code was carefully organized and made available in a repository. A documentation regarding the download and navigation through main folders was also provided.
Decision: Accept
Reusable: Examples along with documentation were made available so that the reuse of the approach in different source code files was facilitated.
Decision: Accept
Replicated: The authors provided a information on how to build the environment and perform the experiments in the paper. However, the provided command lines (e.g., nohup ./fpgen.sh sum 1norm & [I tried with and without the “&”]) for experiments replication didn’t execute properly.
Decision: Reject
Reproduced: This badge involves the replication of the experiments not using the code of the authors. This is not possible at this moment.
Decision: Reject
@timm @minkull @random-friendly-dude @crubiog
@unknown-user1234, thank you for the feedback.
Could you please provide more information on what error the reviewer received for "Replicated"? The feedback simply says the command did not run. However, we provided a Docker to run the commands in, and several people outside our project verified that the commands ran successfully and the results were replicated. We do not see how running the command in the provided Docker could fail.
Thank you,
Hui
@HGuo15
In this paper, authors transform the problem of generating high error-inducing inputs into the code coverage maximization problem that can be solved by performing symbolic execution.
FPGen leverages enables symbolic execution to explore all rounding and cancellation possibilities in different code areas, by injecting inaccuracy checks after floating-point arithmetic operations.
The artifact is publicly available and reusable. They evaluate FPGen on 3 summation algorithms, 9 matrix computation routines from the Meschach library, and 15 statistics routines from the GNU Scientific library (GSL), compared with (1) a random input generator, (2) S3FP, the state-of-the-art floating-point error-inducing input generator, and (3) KLEE-Float, the floating-point symbolic execution engine used in FPGen.
I followed the instruction step described in the GitHub repository: https://github.com/ucd-plse/FPGen.
I used the Ubuntu 18.04.3 LTS featuring Intel® Core™ i7-8565U CPU @ 1.80GHz with 12GB of memory to run the FPGen container in docker.
I run 11 out of 27 benchmarks, the time bound is set to 2 hours. Total running time is about 66 hours.
benchmark | Rel.Error (Random) | Rel.Error (S3FP) | Rel.Error (KLEE-Float) | Rel.Error (FPGen) |
---|---|---|---|---|
pairwise-summation | 0.0000e+00 | 0.0000e+00 | 0.0000e+00 | 1.3174e-16 |
2norm | 3.1249e-16 | 3.1170e-16 | 0.0000e+00 | 2.2117e-16 |
dot | 1.7010e-12 | 4.4579e-09 | 0.0000e+00 | 1.9190e-04 |
lu | 0.0000e+00 | 0.0000e+00 | 0.0000e+00 | 2.7327e+00 |
wmean | 1.72281e-11 | 1.75737e-07 | 0.0000e+00 | 1.0000e+00 |
wvariance-w | 8.85416e-11 | 2.09184e-05 | 0.0000e+00 | 2.2858e-12 |
wsd-w | 4.42709e-11 | 1.04591e-05 | 0.0000e+00 | 1.1429e-12 |
wtss | 5.54205e-16 | 5.31847e-16 | 0.0000e+00 | 4.4513e-16 |
wabsdev | 3.44206e-11 | 2.20766e-05 | 0.0000e+00 | 1.0000e+00 |
wkurtosis | 4.51066e-11 | 1.40364e-07 | 0.0000e+00 | 1.7733e-12 |
wskew-m | 3.78488e-10 | 0.0316462 | 0.0000e+00 | 2.5675e+01 |
The results of all tested benchmarks are exactly the same with the results of the paper.
The reviewer believes the submitted artifact can be given Reusable
and Available
badges. Nevertheless, the other two badges are not feasible at this stage. In case you were not aware, the "Replicated" and "Reproduced" badges are eligible in the sense that your results have been obtained by other research articles in the community. Please refer to this instruction and let us know if you disagree (https://conf.researchr.org/track/icse-2020/icse-2020-Artifact-Evaluation#Call-for-Submissions).
Could you please provide more information on what error the reviewer received for "Replicated"? The feedback simply says the command did not run...
Hi Hui GUo,
Follows the output of the command:
I indexed the lines in the Terminal to easier explain you what I did. I executed line 1, Then I waited for around 2,5 hours. Since it didn't finish, I pressed "Enter" again. Then the lines 2 to 5 were displayed.
It is worth to mention that my CPU has 4 cores and 16 gb of memory. Could it be the cause?
Best regards.
@unknown-user1234
The job is executed in the background and I believe you have successfully run our tool "FPGen". You can then do ls
in the current directory, and you should be able to see a file named result-fpgen.txt
. This file has the results of the two tests you run, i.e., sum, 1norm
.
After that, you can manually inspect the results and compare them to the paper, or you can use our script to automatically check the results of sum
and 1norm
:
../../scripts/cmp_to_ref.sh -m result-fpgen.txt reference/result-fpgen.txt sum 1norm
This script will print pass
when the results match the paper, or print fail
when they don't match. Let me know if you have any questions.
@HGuo15
Even considering your instruction, I didn't manage to execute the code (maybe something is missing in my environment). Anyway, since @random-friendly-dude executed, I am satisfied. However, I keep my decision because replicating all the experiments (necessary condition to achieve the replicated badge) is unfeasible, it demands a large amount of time, and reproducing them is impossible at this moment.
Best regards!
I am agreeing with the above reviewers that this artifact merits "available" and "reusable" but not "replicated" or "reproduced"
it the authors wish to dispute that decision, then I refer to the criteria for replicated and reproduced at https://conf.researchr.org/track/icse-2020/icse-2020-Artifact-Evaluation#Call-for-Submissions.
meanwhile I will NOT close this issue just in case there is any further discussion
@random-friendly-dude @unknown-user1234 @timm @HGuo15
We agree with the recommendation to receive Reusable
and Available
badges. @random-friendly-dude, thank you for linking to the definition of Replicated
and Reproduced
badges, which helped to clarify the situation.
@unknown-user1234 said:
However, I keep my decision because replicating all the experiments (necessary condition to achieve the replicated badge) is unfeasible, it demands a large amount of time, and reproducing them is impossible at this moment.
We just want to make clear that we are not receiving the Replicated
or Reproduced
badges because our results have not been obtained in a subsequent study by a person or team other than the authors (though the subset of experiments run by @random-friendly-dude were reproducible). The reasons given by @unknown-user1234 above, however, are not the reasons why the badges are not recommended.
https://github.com/researchart/rose6icse/tree/master/submissions/available/icse20-main-171 https://github.com/researchart/rose6icse/tree/master/submissions/reusable/icse20-main-171 https://github.com/researchart/rose6icse/tree/master/submissions/replicated/icse20-main-171 https://github.com/researchart/rose6icse/tree/master/submissions/reproduced/icse20-main-171
corresponding author for artifact evaluation: Hui Guo
Authors
Note to reviewers: these authors want multiple badges