jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

atropos detect output #85

Closed nservant closed 4 years ago

nservant commented 5 years ago

Hi, I'm a bit confused by the atropos detect results, and I would need some advice in a context of automatic reads trimming.

Here is an exemple of output ;

Detected 3 adapters/contaminants:
1. Longest kmer: ATCTCGTATGCCGTCTTCTGCTTG
   Name(s): IlluminaSmallRNA3pAdapter1
   Known sequence(s): ATCTCGTATGCCGTCTTCTGCTTG
   Known sequence K-mers that match detected contaminant: 76.92%
   Number of k-mer matches: 24
2. Longest kmer: CAAGCAGAAGACGGCATACGA
   Name(s): IlluminaSmallRNARTPrimer,
            IlluminaNlaIIIexpressionAdapter2,
            IlluminaNlaIIIGexPCRPrimer1,
            IlluminaDpnIIexpressionPCRPrimer1,
            IlluminaDpnIIexpressionAdapter2,
            IlluminaNlaIIIexpressionPCRPrimer1,
            IlluminaNlaIIIGexAdapter2.01,
            IlluminaSmallRNAPCRPrimer1,
            IlluminaDpnIIGexPCRPrimer1,
            IlluminaDpnIIGexAdapter2
   Known sequence(s): CAAGCAGAAGACGGCATACGA
   Known sequence K-mers that match detected contaminant: 70.00%
   Number of k-mer matches: 13
3. Longest kmer: TCGTATGCCGTCTTCTGCTTG
   Name(s): IlluminaNlaIIIGexAdapter2.02,
            IlluminaDpnIIGexAdapter2.01
   Known sequence(s): TCGTATGCCGTCTTCTGCTTG
   Known sequence K-mers that match detected contaminant: 70.00%
   Number of k-mer matches: 13

The point which is confusing to me is that it detected 3 adapters, which are completly different ? What would be your advice ? Do you think I should trim the 3 adapters ? or just choose one and in this case, how ?

Thank you for your help Nicolas

jdidion commented 4 years ago

Sorry I missed this. Right now, the detect command is considered experimental. It detects frequent sequences, but they might not necessarily be adapters. In this case, it looks like you only detected one adapter sequence: the second one is the reverse-complement of the first (and the first one has 3 extra bp). The third one is a sub-sequence of the first. So if you just trim the first adapter you should catch them all.

Once I release v2.0 (which should be before the end of the year), I will work on an improved adapter-detection algorithm that should make the results less confusing.