tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

Artificial data generator #30

Closed tleonardi closed 5 years ago

tleonardi commented 5 years ago

We need to test nancompore with artificial data.

a-slide commented 5 years ago

Will first have to generate a model for each kmers for both dwell time + intensity. The consortium IVT dataset might be the good solution for that: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-transcriptome/fastq_fast5_bulk.md

a-slide commented 5 years ago

We have several options: 1) Reconstitute a full signal

tleonardi commented 5 years ago

I had in mind something like option2, but I see the appeal of option 1! :) The only thinks that comes to my mind is that we can't probably fit a gaussian to the dwell time because it's a discrete distribution... i bet it's a poisson or similar, but it should be easy enough to check. Let's discuss this in person on Friday!

a-slide commented 5 years ago

I am building a kmer model based on the IVT dataset released by the nanopore RNA consortium. https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-transcriptome/fastq_fast5_bulk.md It is supposed to completely unmodified as it was generated by in vitro transcription of a human cDNA library. the dataset is large enough to coverall the possible pentamers at least 7000 times.

a-slide commented 5 years ago

This is now done and I sent a pull request to integrate it in nanocompore #47 It might still require some tweaking.