Open chklopp opened 4 years ago
Unfortunately that is not easily possible. It would be possible to treat two input data one after the other with the same hash strategy and then to fix and sort the partitions in the output file. Then the comparison could be done quite efficiently. However, implementing this would be quite complex and unfortunately I don't have time for that at the moment.
Is there an efficient way to compare large gerbil output files in order to retrieve kmers which are only in one of the two input files?