Detect and reject equivalent mutants

davidkelk commented 12 years ago

Currently, if two different members of the population arrive at the same program structure after mutation, each is evaluated. Time would be saved if after we do the first evaluation, we record a hash of the mutated program and the result of the evaluation. For every new mutation, we can check against the hash list. If we find it, there is no need to test it again. Reject the mutation and try again.

davidkelk commented 12 years ago

Perhaps it would be better to detect the repeat (and thus save on ConTest evaluations) but keep it as it will probably mutate into something different from the previous one. (Maintain diversity.)

kevinjalbert commented 12 years ago

I think that sounds right. It might be the case that the solution is actually one mutation from the already occurred state of the program and the first time it went down another path.

I think we can just skip the ConTest evaluations like you mentioned.

davidkelk commented 12 years ago

Checked this in on master (8e61f91bd6eea8e714daacf7ac574e5c6cf94b99).

Lets say gen 1 mem 3 is the same as gen 1 mem 1. Is there anything that needs to be copied from 1-1 to 1-3? I'm thinking about operator weightings and so-on. Currently this patch doesn't do anything like this. It simply skips the ConTest runs.

Also, it is only for the functional phase at the moment. Is there a reason why it can't be added to the non-functional phase as well?

davidkelk commented 12 years ago

I went with the maintaining diversity approach.

kevinjalbert commented 12 years ago

I think everything from 1-1 should go to 1-3 (excluding the past states if any). We are just acknowledging that this individual state has already occurred thus we don't need to re-evaluate it. We just take the already evaluated state from 1-1 and move it to 1-3 so we can evolve from that. This can influence the operator weighting as it will have duplicates influencing it.

davidkelk commented 12 years ago

A short analysis of the effects of the ARC optimizations:

Super-TLDR Summary:

Generation per generation "optimized"-ARC (OP-ARC) is faster than "old"-ARC
Deadlocks are now occuring in OP-ARC, hurting performance
OP-ARC is taking on average one generation longer to find solutions, so ON AVERAGE IS SLOWER than old-ARC

TLDR Summary:

Analyzed 5 runs each of "old" ARC lottery runs (Table 4 of paper) and "new" optimized ARC runs
Generation per generation optimized ARC (OP-ARC) is about 15% faster
OP-ARC is detecting deadlocks now, which is hurting performance by 14%: On avg. 11 deadlocks per run, adding on avg. 16.5 mins per run
- old-ARC reported no deadlocks, so I'm not sure if I have fixed or introduced a bug
After correcting for deadlocks:
- When old-ARC and OP-ARC find the solution at the same member and generation, OP-ARC was about 20% faster
- On average OP-ARC is taking one generation longer to find a fix.
- Even though generation per generation OP-ARC is faster, as it takes an extra generation, ON AVERAGE OP-ARC is SLOWER than old-ARC

Longer Description of Analysis:

From the logs, determine the average time to find a fix for Lottery:

Old ARC: 1h 55m OP-ARC: 2h 23m (24% slower)
Determine the average generation the fix was found for Lottery:

Old ARC: 3, 4, 2, 1, 2 (Avg = 2.4) OP-ARC: 2, 6, 3, 2, 3 (Avg = 3.2, 33% slower)
Count the number of deadlocks occurring during each run:

Old ARC: 0, 0, 0, 0, 0 Avg. = 0 OP-ARC: 0, 52, 5, 0, 0 Avg. = 11.4

Each deadlock took 94 seconds longer than a data race to evaluate.

Factor out the average 11.4 deadlocks from the time calculation in 1:

Old ARC: 1h 55m OP-ARC: 2h 6m (10% slower)

Deadlocks hurt performance by 14% (24% - 10%).

Using the revised numbers from 4 and the average number of generations to find a fix from 2, determine generation-per-generation (with no deadlocks) how Old ARC and OP-ARC compare:

Old ARC: 1h 55m / 2.4 = 48m OP-ARC: 2h 6m / 3.2 = 39m (23% faster)
Compare an old-ARC run (run 1) and a OP-ARC run (run 3) where each run found the solution on generation 3, member 17:

6a. Time to evaluate generation 1: (No deadlocks)

Old ARC: 50m OP-ARC: 42m (16% faster)

6b. Time to evaluate generations 1 & 2: (No deadlocks)

Old ARC: 1h 29m OP-ARC: 1h 13m (18% faster)

6c. Time to find solution at generation 3, member 17 for both:
(5 deadlocks occurred in gen 3 for OP-ARC.)

Old ARC: 2h 19m OP-ARC: 2h 3m (12% faster) OP-ARC-no-deadlocks: 1h 55m (21% faster)

With deadlocks and generations factored out, generation per generation OP-ARC does run faster than Old ARC.

seer-lab / ARC

Detect and reject equivalent mutants #43