novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

Guppy version compatibility #79

Closed mmiladi closed 3 years ago

mmiladi commented 3 years ago

Hello,

How sensitive is EpiNano m6A trained model to the Guppy version and subversion? I have data called using Guppy 4.0.9 and 3.2.10. Do they expect to work with the latest version of EpiNano? Would you recommend to re-base call them with a specific version of Guppy to get a reliable performance?

Thanks

Huanle commented 3 years ago

Hi @mmiladi , Thanks for your question. To date, I have only tested it on albacore2.1.7 and guppy3.1.5 and I did not see significant (not probability score based) change. The mod-unm sharp contrast in terms of different variants remains. It would be nice to test it on the latest guppy base-callers. If you are interested, you can download the curlcakes data from SRA and re-train some models with data from the new basecallers.

enovoa commented 3 years ago

HI @mmiladi - just to add to @Huanle's answer: we have not tested systematically a trained EpiNano model across Guppy versions, but it should be a bit affected if the RNA model changes across versions (some Guppy upgrades maintain the RNA model but others change it).

If you want to compare datasets, I would recommend that they are all base-called with same base-caller (i.e. I wouldn't compare data base-called with 3.2.10 and 4.0.9, which use different RNA models, although differences are subtle). Also, please note that EpiNano-SVM was trained on Guppy 3.1.5 base-called data, but EpiNano-DiffErr should be independent of Guppy base-caller version. So you could analyze for example your data basecalled with Guppy 4.0.9 with EpiNano-DiffErr.

Hope this helped clarify your doubts!

mmiladi commented 3 years ago

Hi @Huanle and @enovoa , Many thanks for your prompt and detailed replies. Following your suggestion, I would continue with re-base calling the smaller subset which was called by version 4.x using version 3.x, before comparing the analysis. Best, Milad