heidmic / suprb

GNU General Public License v3.0
6 stars 3 forks source link

Improve NS speed #127

Open heidmic opened 2 years ago

heidmic commented 2 years ago

By caching Hamming distances with archive members

Currently, we compare each rule in the population with each rule in the archive by computing their Hamming distances (on the match sets). We do this in every generation of an RD phase. The archive grows with each generation by a set number of rules. After an RD phase it is reset and in the next phase initialised with the pool of rules, therefore, growing linearly with each cycle. Given that at least some rules remain unchanged between two generations (or at least their match sets do not differ), we would only need to compute their distance to the newly archived rules instead of recomputing all distances. Possibly, we even should have those as these rules were part of the population before and therefore the distance should already be computed.

By speeding up the distance calculation

It might be that the scipy version of the Hamming distance is very inefficient for sets of our size