zuzkajelcicova / CS454-Automated_patching_using_GP

4 stars 0 forks source link

Weighted path #2

Open zele-git opened 5 years ago

zele-git commented 5 years ago

In our reference paper (Automatically finding patches using genetic programming), weighted path is defined as follow,

"A weighted path through that program. The weighted path is a list of pairs, each pair containing a statement in the program and a weight based on that statement’s occurrences in various test cases."

This ideas used in the mutation and crossover operation. Are we going to consider this issue in our implementation?

zuzkajelcicova commented 5 years ago

@zele-git we are adopting the approach mentioned in their other paper (Fixing 55 out of 105 bugs...) since we use a sequence of patches and not the whole ASTs as it used to be originally.

In the original GenProg paper, each individual in the population was an AST combined with a weighted execution path you are mentioning. However, we are using the improved version where each individual in the population is just a sequence of edit operations (sequence of our Patch objects). GA only uses fitness to evaluate its offspring and modify offsprings’ sequence. The other paper (Fixing 55 out of 105 bugs...) uses the phrase weight, but that is for measuring the fitness -> the fitness value is measured as a weighted average of the positive (i.e., initially passing, encoding required functionality) and negative (i.e., initially failing, encoding a defect) test cases.

So regarding fitness, crossover, mutation etc., please, study other paper (Fixing 55 out of 105 bugs...) in detail as well (these two papers are our main reference and inspiration) since we are using their improvements too. @zele-git @Asdf11x @bunverdenz

Furthermore, regarding the crossover and (tournament) selection of individuals, they are stating important information (just a part from the paper): "The search begins by constructing and evaluating a population of random patches. Line 1 of Figure 1 initializes the population by independently mutating copies of the empty patch. Lines 2–6 correspond to one iteration or generation of the algorithm. On Line 3, tournament selection [17] selects from the incoming population, with replacement, parent individuals based on fitness. By analogy with genetic “crossover” events, parents are taken pairwise at random to exchange pieces of their representation; two parents produce two offspring (Section III-G). Each parent and each offspring is mutated once (Section III-F) and the result forms the incoming population for the next iteration."

Improved crossover we are going to use (allowing different lengths of genes we talked about at the meeting):

"Our new patch subset crossover operator is a variation of the well-known uniform crossover operator [18] tailored for the program repair domain. It takes as input two parents p and q represented as ordered lists of edits (Section III-B). The first (resp. second) offspring is created by appending p to q (resp. q to p) and then removing each element with independent probability one-half. This operator has the advantage of allowing parents that both include edits to similar ranges of the program (e.g., parent p inserts B after A and parent q inserts C after A) to pass any of those edits along to their offspring. Previous uses of a one-point crossover operator on the fault localization space did not allow for such recombination (e.g., each offspring could only receive one edit to statement A)."

So once again, even though we are following the original approach of GenProg, we are introducing a few of their improvements in the later version. Therefore, refer to the second paper as well, it contains important information about many things such as tournament selection, pseudo-code for the GA process etc.