rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

miRNAseq read length in mirdeep2 #62

Closed shannjiang closed 4 years ago

shannjiang commented 4 years ago

Dear Sir or Madam,

I am trying to use mirdeep2 to call miRNA in my miRNAseq data. As I know the length of miRNAs is 20-25 bp, I noticed the read lengths of the reads in the tutorial toy .fasta file are around the same as the miRNA length, but in my case the read lengths in the miRNAseq are 50 bp always. Is that reasonable to have a read length in the miRNAseq almost double the size of the common miRNAs? If it is not, what makes the read length so long? Can the mirdeep2 call out miRNAs in miRNAseq with a read length so long?

thanks,

Shan

mschilli87 commented 4 years ago

@shannjiang: For technical reasons, the cDNA reverse-transcribed from the actual miRNA molecules is extended with 3' and 5' adapters. Typically, the 5' part is not sequenced, but since the shortest read length Illumina provides kits for is 50, oftentimes the reads consist of the miRNA sequence followed by the 5' end of the 3' adapter sequence. It is possible to trim the reads before passing them to miRDeep2 (e.g. using flexbar, trimmomatic, or a similar tool) or let miRDeep2 handle them (see the -k option in the README). If you don't know the adapter sequences used in the protocol that generated the data and can't find out, you can use tools like fastp or FastQC to identify it.

shannjiang commented 4 years ago

Thank you, Marcel Schilling! It's really informative!

Shan


From: Marcel Schilling notifications@github.com Sent: Friday, February 28, 2020 2:37 AM To: rajewsky-lab/mirdeep2 mirdeep2@noreply.github.com Cc: shannjiang shannjiang@hotmail.com; Mention mention@noreply.github.com Subject: Re: [rajewsky-lab/mirdeep2] miRNAseq read length in mirdeep2 (#62)

@shannjianghttps://github.com/shannjiang: For technical reasons, the cDNA reverse-transcribed from the actual miRNA molecules is extended with 3' and 5' adapters. Typically, the 5' part is not sequenced, but since the shortest read length Illumina provides kits for is 50, oftentimes the reads consist of the miRNA sequence followed by the 5' end of the 3' adapter sequence. It is possible to trim the reads before passing them to miRDeep2 (e.g. using flexbarhttps://github.com/seqan/flexbar, trimmomatichttps://github.com/timflutre/trimmomatic, or a similar toolhttps://duckduckgo.com/?q=rna+seq+read+adapter+trimming+tool&ia=web) or let miRDeep2 handle them (see the -k option in the READMEhttps://github.com/rajewsky-lab/mirdeep2/blob/master/README.md). If you don't know the adapter sequences used in the protocol that generated the data and can't find out, you can use tools like fastphttps://github.com/OpenGene/fastp or FastQChttps://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/fastqc.html to identify it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/rajewsky-lab/mirdeep2/issues/62?email_source=notifications&email_token=AHTEBSHAIM6CHP3BGOYIJKLRFDELDA5CNFSM4K5HYCJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENHXBPI#issuecomment-592408765, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHTEBSEP57DJJGV54G7AK73RFDELDANCNFSM4K5HYCJQ.