Preprocess daily batch to find identical sequences?

It may be worth preprocessing each daily batch of sequences to find those that are identical. It might look like:

Bin all the sequences from a particular key (masked sequence) into a default dict
Run matching just on the keys
Postprocessing the paths groupings to add in all the identical sequences that are in that group with the same values as the exemplar.

Older code did something like this, but kept a running track of all identical sequences. This would be simpler, and would not require state across calls.

It's unclear whether the number of identical sequences in a daily batch would actually warrant this - I suspect it would be a small performance boost.

jeromekelleher / sc2ts

Preprocess daily batch to find identical sequences? #158