Open notestaff opened 6 years ago
@marekkokot for filtering paired reads you could use strand information: if the kmers come from one strand only (e.g. kmers from a genome), you could check that read1 has kmers from one strand while read2 has kmers from the other strand.
@marekkokot For the most precise filtering, you'd have kmc_tools filter take as input two single-strand kmer databases: one made by kmc -b from a set of genome sequences, and one from reverse complements of these sequences. You'd then keep a read if it meets the filtering criteria for either database. For paired reads, you'd keep a read pair if read1 meets criteria for the forward-strand database and read2 for the reverse complement-strand database, or vice versa.
When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.
I would find this most useful, too!
When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.