Closed weir12 closed 5 years ago
Are you training on RNA modifications? And also how you generated the RNA, what kind of ratio of modified nucleotide did you use?
Hypoxanthine,a kind of modification of adenosine in RNA.
Hypoxanthine,a kind of modification of adenosine in RNA.
Is every adenosine modified or is a mixture in each strand.
Have some problems remapping reads because of the low basecall quality using 100% replacement.
A small amount of adenosine was modified in each read.So I'm worried about the size of the training set.What's more,the sequences(basecalls.fa) basecalled by trained modified models can not be mapped to reference sequences.
Yeh seems RNA is a lot more challenging because of the ground truth issue.
To generate the ground truth fasta, how did you know which adenosine to switch to your modification? Did tombo tell you where bases differ from canonical? I am still struggling to work out tombo.
Did the hypoxanthine cause a lot deviation in the signal dwell? I think this may cause a lot of problems during the training in losing lots of chunks to analyze. The model I made was no good and just output lots of repetitive sequences. Hoping that relaxing the stringency on dwell signal may help to train on more chunks.
Yes,Tombo
can detects modifications in three ways, while the per_read_statistics
generated records potential modifications per base per read.But I'm just using API of tombo to extract the signal I want rather than using detection methods provided by Tombo.
During my training processing ,most of the blocks passed the test(46 passed,0 emptysequence)
Except the trouble of ground truth,pre-trained basecaller model
,which is used to determine the alignment between raw signal and reference sequence.It also has a big impact on the results in my view.I'm training my canonical base model for to replace what the taiyaki provides by default(r941_rna_minion.checkpoint).
If the model is for RNA then you will have to reverse the calls from basecall.py
to match the output of guppy (note: just reverse, not reverse complement). This is because of differences in chemistry between the RNA and DNA kits.
The next release of taiyaki will include a flag to enable this option; guppy handles it through its configuration files.
@myrtlecat Thanks!I will pay attention to this differences later.
Hi:
I find
basecalls.fa
that is generated bybasecall.py
andmodel_final checkpoint
is completely unable to mapping to the reference sequence files(0% mapping ratio). However,the fastq file which is basecalled byguppy_basecaller
andmodel.json(derived from the model_final checkpoint)
is Almost complete alignment to the reference sequence(99.89% mapping ratio) Just like the problem I found before. https://github.com/nanoporetech/taiyaki/issues/38#issue-477469064basecalls.hdf5
which is contains the information about the presence of modifications was observed that the distribution of modified bases was extremely uneven.Some reads contain dozens of consecutive modified bases with high probabilityUpdate:The sequence from
basecalls.py
need be reversed if sequencing method is direct RNA. The second problem is a parallel computation error in my custom script.Now settled.