Closed tleonardi closed 5 years ago
Will first have to generate a model for each kmers for both dwell time + intensity. The consortium IVT dataset might be the good solution for that: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-transcriptome/fastq_fast5_bulk.md
We have several options: 1) Reconstitute a full signal
I had in mind something like option2, but I see the appeal of option 1! :) The only thinks that comes to my mind is that we can't probably fit a gaussian to the dwell time because it's a discrete distribution... i bet it's a poisson or similar, but it should be easy enough to check. Let's discuss this in person on Friday!
I am building a kmer model based on the IVT dataset released by the nanopore RNA consortium. https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-transcriptome/fastq_fast5_bulk.md It is supposed to completely unmodified as it was generated by in vitro transcription of a human cDNA library. the dataset is large enough to coverall the possible pentamers at least 7000 times.
This is now done and I sent a pull request to integrate it in nanocompore #47 It might still require some tweaking.
We need to test nancompore with artificial data.