gerbil output file comparison

uni-halle / gerbil

A fast and memory-efficient k-mer counter with GPU-support

MIT License

34 stars 14 forks source link

gerbil output file comparison #15

Open chklopp opened 4 years ago

chklopp commented 4 years ago

Is there an efficient way to compare large gerbil output files in order to retrieve kmers which are only in one of the two input files?

merbert commented 4 years ago

Unfortunately that is not easily possible. It would be possible to treat two input data one after the other with the same hash strategy and then to fix and sort the partitions in the output file. Then the comparison could be done quite efficiently. However, implementing this would be quite complex and unfortunately I don't have time for that at the moment.