Open martynakgajos opened 4 years ago
Hi,
I plan to work on direct RNA sequencing probably within the next couple of months. You're right that I'll need a new k-mer model and corresponding index probability thresholds. The probability issue is actually separate from event detection though. Event detection attempts to segment the raw signal into stretches that represent individual k-mers, and direct RNA is sequenced at a slower speed which will require different parameters, which is not something I've explored yet. I will leave this issue open until we have support for direct RNA.
Thanks, Sam
@skovaka , I understand the kmer-model issue. However, the event detection that you are re-using from scrappie should work equally well for RNA right since it works for scrappie? What do you mean by "direct RNA is sequenced at a slower speed which will require different parameters". Is it not the same with scrappie? Thanks in advance for the help.
The event detection algorithm has parameters which affect the expected number of samples per base, which is different for RNA because it moves at a slower speed. The same code runs on RNA, it just makes many more "stay" errors unless you change parameters. I haven't tried running Scrappie event basecalling on RNA, but I don't see any references to RNA in the event basecaller documentation or code, so I'm not sure if/how it works. I've been working on adapting UNCALLED to direct RNA and have found event detection parameters that work fairly well, but the rest of the algorithm isn't quite ready for release yet.
@skovaka Thank You for the detailed explanation. I realized this would be the issue after trying to do event detection on RNA myself yesterday. Do you think you could share the event detection parameters for RNA, you have discovered? It would help me with my research. I will be happy to cite your work and acknowledge your help in getting the right parameters, in case my work succeeds & gets published. If you think this is possible, we could correspond over email, hariss@umich.edu. Thank You.
I really like your preprint and I would love to use it for direct RNA sequencing. Do you happen to work on adjusting the UNCALLED for direct RNA sequencing? Given that an accurate RNA k-mer model is available, how difficult is it to optimize event detection parameters? I understand that by event detection parameters you mean index probability threshold, am I right?