ocelab / ocelot

OCELOT is a testing tool employing genetic algorithms to automatically generate tests for C functions
https://ocelot.science/
3 stars 1 forks source link

Handle recursive data structures #7

Open intersimone999 opened 7 years ago

intersimone999 commented 7 years ago

OCELOT does not work with recursive data structures. For example:

struct test {
  int data;
  int data2;
  struct test *to_test;
};

The main problem here is defining a chromosome representation GAs and other algorithms and to define a procedure to dynamically translate the chromosome in variables that should be passed to the function.

The solution should work also in a scenario similar to this:

typedef struct _s1 {
  int a;
  struct _s1* b;
} s1;

typedef struct _s2 {
  int* a;
  struct _s2* b;
  s1 c;
} s2;

void test(int, double, char*, s1*, s2);

At the moment, we linearize the structures, and this prevents us to handle this kind of case. Our chromosome representation is the following:

val_1
val_2
...
val_n
array_1[k]
array_2[k]
...
array_m[k]
ptr_1
ptr_2
...
ptr_m

Where n is the number of byval parameters, m is the number of reference parameters and k is the maximum size of the array, defined as a test-case generation parameter.

We should devise a new generic, nonlinear chromosome representation.

Any ideas?

dardin88 commented 7 years ago

We can think about it, but I still cannot understand in which cases our current solution does not work.

intersimone999 commented 7 years ago

@dardin88 It doesn't work with the example in the original post. At the moment, it doesn't work for sure if there is a recursive data structure (e.g., linked list). Actually, we don't handle pointers to structures properly, I would say, we just handle pointers to native types. If we have a pointer to a structure, we treat it as a normal structure and we linearize it to represent it in the chromosome, and all the fields of the structure are (i) represented as numbers in the chromosome, if they are native types/pointers or (ii) further linearized if they are structures (note that pointers to structures belong to the first category).

For example, consider a struct K with two integer fields and a function f(K*). The chromosome is simply composed by two numbers, as if it was f(K), because we linearize the structure. Then, when f is called to evaluate the fitness function, we pass the variable of type K by reference (*x).

If there is a function f(s1) (s1 is the struct defined in the original post), it considers it as f(int, s1*), therefore it represents it as three numbers (doubles): one for the first "int", one for the "int" in s1 and another one for the pointer to s1 in s1. I don't remember what happens, it may result in a segmentation fault. Anyway, if a branch can be covered only if the recursive data structure has depth > 2, probably it won't be covered.

EDIT: note that, in the original post, "array_x" are always arrays of numbers.

intersimone999 commented 7 years ago

Properly handle recursive data structures

giograno commented 6 years ago

I though about a tree encoding of a solution, similar to the concept used in genetic programming.

Let's consider for instance the following linked list:

typedef struct node {
    int val;
    struct node * next;
} node_t;

Think about a possible function under test that has this signature

void my_func_to_test( int a, node_t * b) 

Chromosome Representation

We might represent the chromosome for this function in a way similar to the following: repr We have a root note (the one in blu) and the first level children are the direct input parameters for the function under test. In this case, we have a primitive int a and the linked list b. We might randomize the length of the recursive structure using some parameters or something like that.

Crossover

When it comes to crossover, for instance, we should cut the chromosome in a very specific point, i.e., above a top level structure in input (the circles in red). So, for instance, this might be an example of crossover: crossover P.s: in the figure a value for the structure b1 is missing.

The crossover for the linked list is somehow similar to the one we already have in OCELOT for pointer, i.e., a separate crossover for the struct itself. Obviously, all the other input to the function might have their independent crossover like the actual implementation. For instance, assuming that instead of the int a we have int[] a. We could think about the following crossover operations:

  1. Crossover between the 2 int[]a arrays for the two solution, like a normal crossover;
  2. Crossover internal to the recursive structure, like the example before;
  3. Crossover at a top level, i.e., simple exchange of the two node_t * b in this case.

This is just a rough idea that came into my mind, noticing that the problem is kind of similar to the genetic programming evolution. If you think that we can somehow explore it, let's discuss about that (or otherwise, destroy such hypothesis 😝)