skovaka / UNCALLED

Raw nanopore signal mapper that enables real-time targeted sequencing
MIT License
519 stars 44 forks source link

Building reference gene panel, advise #25

Open ChloeDG opened 3 years ago

ChloeDG commented 3 years ago

Hi there,

I tried to use uncalled to improve the efficiency of cas9 targeted sequencing on low inputs and it seems the reference I generated rejected the reads after ~1KB instead of continuing to sequence the rest of the read.

uncalled realtime mouse_cas9panel.fa --port 8000 -t 8 --enrich -c 3 > uncalled_realtime.paf

uncalled _fail3

I have not had the same issue with enriching in small genomes, like virus for example in a mixture of host DNA, or with a reference panel based on hg38. Have you seen this before or know of problems with generating references from mm10?

I am designing a panel of genes for a large signature in mouse (>150 genes) and I want to make sure that this doesn't happen. Do you have any advise/ tips when making a reference fasta for a large panel of genes?

Ubuntu 20 Uncalled version 2.1 MinKnow GUI 4.1.22

ChloeDG commented 3 years ago

Opps, I just discovered your suggestions for masking: https://github.com/skovaka/UNCALLED/tree/master/masking I will try that.

skovaka commented 3 years ago

Sorry you're having trouble! Reference masking is a good idea. Just to make sure, does your reference extend all the way to the ends of your targets? I'd also recommend running UNCALLED on those "regular cas9" reads to see if they can map to you reference in standalone mapping mode (see "Fast5 Mapping" in the readme). I haven't worked with the mouse genome before, but if you're still having trouble after masking you could send me your reference and I could take a look.

ChloeDG commented 3 years ago

I am not sure if masking helped. The regular cas9 reads definitely map to the reference. But we don't see an improvement in yield compared to whole genome sequencing. What is the best way to send you the reference so you can check it out? Thank you!

skovaka commented 3 years ago

You can email your reference to skovaka1 jhu.edu. Other factors which could reduce your enrichment could be read lengths or your computer's CPU, so if you could let me know the lengths of your on-target reads and your CPU model that could be helpful. On Ubuntu running the command grep "model name" /proc/cpuinfo should return the model name.