input sequences to POA in lower case to score as nucleotides

jts / nanocorrect

Experimental pipeline for correcting nanopore reads

MIT License

39 stars 10 forks source link

input sequences to POA in lower case to score as nucleotides #1

Closed jts closed 9 years ago

jts commented 9 years ago

as per @sjackman on twitter

https://twitter.com/sjackman/status/568835294193000448

jts commented 9 years ago

BLOSUM was trained on conserved regions of genomes, not sequencing data, so neither the nucleotide nor amino acid scoring scheme is a good fit for nanopore data. I tested this and found that the (default) amino acid scheme performs better (in terms of correction accuracy) than the nucleotide scheme.

I'm closing this for now but training a new scoring scheme specific to basecalled nanopore data is something we should do in the future.

sjackman commented 9 years ago

Hah! That makes me sad and scared. It drives me absolutely bonkers when I find and fix an egregious bug and get worse performance. Wouldn't be the first time.