refresh-bio / KMC

Fast and frugal disk based k-mer counter
276 stars 72 forks source link

filtering paired reads #68

Open notestaff opened 6 years ago

notestaff commented 6 years ago

When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.

notestaff commented 6 years ago

@marekkokot for filtering paired reads you could use strand information: if the kmers come from one strand only (e.g. kmers from a genome), you could check that read1 has kmers from one strand while read2 has kmers from the other strand.

notestaff commented 6 years ago

@marekkokot For the most precise filtering, you'd have kmc_tools filter take as input two single-strand kmer databases: one made by kmc -b from a set of genome sequences, and one from reverse complements of these sequences. You'd then keep a read if it meets the filtering criteria for either database. For paired reads, you'd keep a read pair if read1 meets criteria for the forward-strand database and read2 for the reverse complement-strand database, or vice versa.

hannesbecher commented 3 years ago

When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.

I would find this most useful, too!