not able to remove adapter from sequece

GoogleCodeExporter commented 9 years ago

I have a fastq files, sample is given (test1.fastq)

@SRR768350.12 FCD0F4MABXX:8:1101:2337:2120 length=49
TGGAGTGTGACAATGGTGTTTGTATCTCGTATGCCGTCTTCTGCTTGAA
+SRR768350.12 FCD0F4MABXX:8:1101:2337:2120 length=49
CAAGCAGAAGACGGCATACGAATGGTTTAGCGCCAGGTTCCACACTTDF

The original adapter is 
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT   

And I am using the reverse complement of it to remove it from the sequence.
AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG  

I am using following command
cutadapt -a AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG  test1.fastq

As the "TCGTATGCCGTCTTCTGCTTG" 3' part of adapter is in the 3' part of the 
sequence. But currently, cutadapt doesnt handle this situation. Can you please 
tell me if anyhow I can deal with these situtations.

Original issue reported on code.google.com by tandon.a...@gmail.com on 14 May 2013 at 8:59

GoogleCodeExporter commented 9 years ago

Are you sure that this is the correct adapter sequence? I have a list of 
Illumina adapters, which I found somewhere in the seqanswers forum. In it, the 
adapter you give (CAA...TCT) is called “Illumina Single End Adapter 2”. But 
there is also another adapter called “Illumina Small RNA 3p Adapter 1” with 
the sequence ATCTCGTATGCCGTCTTCTGCTTG. That one fits much better:

@SRR768350.12 FCD0F4MABXX:8:1101:2337:2120 length=49
TGGAGTGTGACAATGGTGTTTGTATCTCGTATGCCGTCTTCTGCTTGAA
                       ATCTCGTATGCCGTCTTCTGCTTG

At least in humans, TGGAGTGTGACAATGGTGTTTG is an actual miRNA (hsa-miR-122-5p), 
so that seems to be more correct.

Perhaps the actual adapter sequence you need to use is 
TATCTCGTATGCCGTCTTCTGCTTG, that is, the same but with a T added to the front. I 
don’t know the details of the protocol. You could try the shorter version 
first and then inspect the ends of all reads that were trimmed. If all have a 
trailing T, then that is probably also part of the adapter sequence.

Original comment by marcel.m...@tu-dortmund.de on 15 May 2013 at 8:36

GoogleCodeExporter commented 9 years ago

Thanks,
That worked. T ran FASTQC and and found that "Illumina Single End Adapter 2" 
with lots of other primers were present in the sample. That's why i used that 
adapter sequence.
Also, you were right about the sequence being miRNA. Its rat miRNA sample.
Thank you very much.

Original comment by tandon.a...@gmail.com on 15 May 2013 at 1:05

GoogleCodeExporter commented 9 years ago

Great, I’m happy I could help. Regarding my guess about it being a miRNA 
dataset: I cheated by looking up the SRA accession before I answered :).

Original comment by marcel.m...@tu-dortmund.de on 15 May 2013 at 2:26

Changed state: WontFix
Added labels: Type-
Removed labels: Type-Defect

jgaetel / cutadapt

not able to remove adapter from sequece #63