researchart / fse16

info about artifacts from fse16
7 stars 3 forks source link

Su_Dyclink #18

Open mikefhsu opened 8 years ago

ctreude commented 7 years ago

@mikefhsu

I downloaded the virtual machine from https://github.com/Programming-Systems-Lab/dyclink, but Windows 8 tells me that the zip file is invalid. I tried twice, with the same result. Any idea what the problem could be?

mikefhsu commented 7 years ago

@ctreude Hello, I just downloaded the zip file again and am able to unzip it. I used a Macbook pro. We also tried our zip file on Linux, and it also works pretty well. Do you have any chance to try it on any Mac or Linux machines? Thanks.

ctreude commented 7 years ago

@mikefhsu Sorry, not at the moment. Hopefully the other reviewers will be able to confirm that it works on Mac / Linux.

mikefhsu commented 7 years ago

@ctreude I can share with you the zip file that I just downloaded from our github(the google drive link). I have successfully unzipped it. Do you have any space for me to upload? Thanks.

ctreude commented 7 years ago

@mikefhsu You could try something like Dropbox. Or maybe you could use your university's web server? Thanks!

mikefhsu commented 7 years ago

@ctreude I just share with you another link to my google drive. Can you try and see if you can unzip this file?

Thank you.

hongyujohn commented 7 years ago

The zip is invalid for my Win10 machine...we will try to find a Linux one.

ctreude commented 7 years ago

I've tried WinZip, WinRAR and 7-Zip, all without success. @mikefhsu, is there anything you can change about the compression format? Thanks!

mikefhsu commented 7 years ago

@ctreude @hongyujohn @cmcmil It seems that the zip file cannot be extracted on Windows machine, so I upload the VM (without any zipping) directly. It is about 20GB. I've sent out the invitation on Google Drive. A directory called DYCLINK containing the VM, which is a Linux image. You can use Virtual Box to run my VM. Please let me know if you can access it or not. Thanks for your feedback.

mcmillco commented 7 years ago

@mikefhsu: I run Linux, so I didn't experience any problem.

Very Brief Summary

This artifact is the executable tool DyCLINK for detecting similar code.

Insightful

The insightfulness of the artifact is established in the associated paper: the artifact implements a new dynamic technique for detecting code that behavies in a similar way. It works by observing the inputs and outputs of programs.

Useful

This artifact is definitely useful. Since the work is newly-published, there is not another tool that does the same thing. The artifact makes the tool available via a permissive MIT license, which I view as a strong positive.

Usable

The artifact is available on Github along with a virtual machine image and extensive README. I appreciate that the dependencies are listed clearly in a Step 0, and that a setup script is provided to create the necessary directory structure (frequently tools forget to mention that structure).

This artifact does well on all three of the review criteria. It will make a good addition to the artifacts track.

hongyujohn commented 7 years ago

We are able to run the tool now and can see the results of SQL queries. However, how to evaluate the correctness of the results? I only see some class files, how do I know if they are similar or not? (any ground-truth?)

mikefhsu commented 7 years ago

@hongyujohn In A6, we discuss about how to use the script "dyclink_query.sh" to compute the number of code relatives. This script can tell you the number of the code relatives of a specific comparison. For example, 2011-2012. The ground truth is in the Table3 of our FSE paper. Let me know if you need further information. Thanks.

mikefhsu commented 7 years ago

@hongyujohn If you want to know which programs have the similar behavior, you can refer to our experiment of Software Community Clustering in Table4, even though this is out of the scope of our artifact evaluation. Basically, we assume that the programs in the same year have better opportunity to have similar runtime behavior. Thanks.

hongyujohn commented 7 years ago

Currently, I don't have the paper to check the results, the paper given in your website is missing... (with error message: "Oops! That page can’t be found").

So I am afraid that I cannot evaluate the results?

mikefhsu commented 7 years ago

@hongyujohn I cannot publish our FSE paper, because we are still preparing for our camera-ready. I assume that you can have the paper from Easychair? Or I can send you our submission directly. Thanks.

hongyujohn commented 7 years ago

I don't have the paper so I cannot evaluate the results now.

mikefhsu commented 7 years ago

@hongyujohn I just pulled our paper from easychair, and send it to your gmail. Thank you.

ctreude commented 7 years ago

@mikefhsu Thanks, the 20GB file worked as advertised.

Review:

The authors detect similarly behaving software by observing how programs compute their results at the instruction level by means of execution traces. Information is encoded in dynamic instruction graphs.

The artifact is useful and usable and works as advertised. All necessary information is present, and the instructions are easy to follow and make sense.

The separate steps -- graph construction, similarity computation, and results analysis -- follow directly from the approach. The authors could improve the convenience of their artifact by scripting parts of the result analysis as well -- removing the step of having to open the MySQL Workbench and digging into the database schema (but this is a minor detail of course).

Another minor detail is that I didn't understand the need for prompting the user to decide whether to store the results in the database. What alternatives are there? Also, "true" wasn't the most intuitive answer here -- why not include "y" or "yes" as well?

hongyujohn commented 7 years ago

Brief Summary We are able to run this tool as described, after an amount of setup effort (as shown in this thread). Its main feature is to detect similar code-pairs based on certain graph similarity metrics. However, we are unable to check the correctness of the results and determine its recall/precision as the returned results are binary.

Insightful DyCLINK may be able to detect some similar code based on execution trace.

Useful If the accuracy of this tool is good, it may be useful for maintenance and refactoring work. But again, we are unable to check the correctness of the results as the returned results are binary.

Usable This tool can be downloaded from Github, and a detailed tutorial is provided. Following the steps in the tutorial, the tool could run. A prepared virtual machine image is provided, which is also available online.

mcmillco commented 7 years ago

I concur with a "Platinum" rating for this artifact.

mikefhsu commented 7 years ago

@ctreude @hongyujohn @cmcmil Thanks for your valuable feedback and comments for our artifact. This can be helpful for us to enhance our tool.

We have some feedback for @hongyujohn as follows:

As @hongyujohn mentioned, because the output of DyCLINK seems to be binary, he is not able to judge the correctness. However, DyCLINK actually stores the similarity of each matched (sub)graphs in the database. As our artifact paper mentioned, the script "dyclink_query.sh" helps the user collect the number of code relatives, which we recorded in Table 3 in our research paper. This is also the scope of our artifact paper and what our VM can reproduce.

Because DyCLINK detects code that has similar runtime behavior at instruction level that may have dissimilar syntaxes, it is extremely hard for human to verify the correctness of the detected code relatives. Thus, we propose a KNN-based software clustering technique (Table 4) to help users understand which programs or modules behave similarly in runtime.

The reason that we choose not to include this KNN-based experiment in the artifact evaluation is because of VM's performance. To reproduce our Table 4, the user needs to replay each comparison in Table 3, which may cost the user several days on the VM. Because DyCLINK conducts inexact sub-graph matching with hundreds/thousands of nodes, it is computational expensive (even with our newly developed algorithm in the research paper). Thus, we conducted our experiment on real machines with powerful CPUs and memory. The full experiments in our paper can then be done in hours as we mentioned in our research paper (Table 1).

We have provided our system settings and source code on github. If the reviewer is also interested in replaying our full experiment including Table 3 and Table 4 in our research paper, we would love to answer any technical questions to replay them. Hope our explanation can answer the questions of our reviewers.

We will appreciate the comments from our reviewers about the output format of DyCLINK: what will be the better output format that DyCLINK can offer to help users understand programs with similar runtime behavior?

Again, we appreciate your efforts and time to evaluate our system. Your comments will help us enhance our system.

timm commented 7 years ago

Note these labels are still "under discussion" and are still subject to change prior to the final notifications Friday.

mikefhsu commented 7 years ago

Thanks again for the valuable comments from @ctreude @hongyujohn @cmcmil .

Even though our cluster analysis is already in our github repo, we update our README to help future users reproduce Table4 (which is out of the scope of our artifact paper this time) in our research paper.

We appreciate your feedback!