medvir / VirMet

Set of tools for viral metagenomics.
13 stars 5 forks source link

Quality filtering of NanoPore reads #25

Open mihuber opened 7 years ago

mihuber commented 7 years ago

Disable length trimming based on quality for NanoPore reads. NanoPore quality scores do not follow PHRED scores.

@2b89169b-74c9-4653-8877-c3f21ec7674d runid=2564fad3a9906330fcc0c6c90ac0c8128b89d9ff read=213 ch=124 start_time=2017-05-24T11:01:53Z
TTGTTCGTTCAGTTGGGTGTTTTATGGTTTCGTTTTTCGTGCGCCGCTTCAACAGATGAAGATGTGACATCCATTATTAATAAAGTAGACAGGCATGCTGTGGTCAAAATGGCAGTTTGTGGTAATATAGCACCATATAGCAAGAATTCTAACAAAGTTTTAACTGTAATTTAACTAATCATTATCAAATCTCTGTAGCTGCTGAAGAGAAAAGAAAAGTGGTACACTTCTGCAAATGAGACAAATCCTGTTCATCGCCATCGACAGCATGGTTGAAAAAACTCTTTGATGGTTGTTGCATTTGGAGTGCTCTGTAGTTGACATTAAACCAGCCGTCAAAAGAATTGGCCTATTAAGCCAATGGCTTGATGTTGACTAAGAGTGTAGGAGCAGC
+
$$$')7',+*$&%(+2/3/2=,)''.+)*.,,0:830,+612033).95:8*1239BDCA2461=;<A?=B:268184?7>?@928+(63+.126>BBC@62.-.3C=:4*5134;F5/3,+'5*-.926:3:B;C<9..238./53;;11)249*.1DEA1?2/),(--74-++'128;1E0-.27E3.++)'(,17876355<.*,'-880'/46.,6.7570/0B1D?8202--,*-0&01*)?1..EFFBE@3254+/+,+34666=8:0/;EFFGEBD5>=/++1+35@,=A35.68++2)(+'),)213.369;>4*/+<?>=856?7>B?9C7.(.*5/245?//7:=8544.,.23/2*+(.*@.+*2(0/&*+/*))+0-,-*)+
ozagordi commented 7 years ago

What format do they follow then? What is a good suggestion to filter NanoPore reads, if they should be filtered at all?

mihuber commented 7 years ago

NanoPore quality scores: The Phred quality score defines the quality of each base in the sequence, with values from from 0 to 93. The score is calculated as: Quality score = -10 x log(Pe) where Pe is the estimated error probability for each base. For example, an error of 1 in 100 will give a q-score of 20. The q-scores are then encoded in the Sanger format using ASCII, with values of 33 to 126. The quality is then shown as a single character per base. https://community.nanoporetech.com/technical_documents/data-analysis/v/datd_5000_v1_reve_22aug2016/basecalled-fast5-files

ozagordi commented 5 years ago

I would unassign me from this. Any volunteer taking the responsibility for this issue?