Closed wlhCNU closed 5 years ago
The progress bar for the detect_modifications
output is genomic regions. Have a look at the multiprocessing help output from tombo detect_modifications alternative_model -h
for more information.
This recommendation was made primarily from DNA data using the alternative_model
testing method. Modified RNA data is a bit trickier to validate as ground truths are much harder to find. In general more coverage is always better, but finding an acceptable level is generally application specific, so we don't have a global recommendation for all applications and/or tombo detection methods at this time.
The 0.33 fraction with a coverage of 1 is due to the --coverage-dampen-counts
default. This adds 2 pseudo-reads which are unmodified to each reference position. Thus keeping lower coverage regions from dominating the most significant sites from a run (as the fraction is an unstable measure for low coverage).
As with the previous question, this cutoff is application specific.
Hi marcus: I have a few questions to ask you for advice about identifying of modified bases from direct RNA nanopore data about the model species arabidopsis thaliana and would appreciate your help.
###################resquiggle log############################################# 17:17:41] Loading minimap2 reference. [17:17:46] Getting file list. [17:26:35] Re-squiggling reads (raw signal to genomic sequence alignment). 100%|##########| 1369723/1369723 [38:57:51<00:00, 9.76it/s]
[08:24:26] Final unsuccessful reads summary (65.2% reads unsuccessfully processed; 893412 total reads): 31.2% ( 426794 reads) : Base calls not found in FAST5 (see
tombo preprocess
)28.5% ( 389754 reads) : Alignment not produced
5.4% ( 73451 reads) : Poor raw to expected signal matching (revert with
tombo filter clear_filters
) 0.2% ( 3406 reads) : Read event to sequence alignment extends beyond bandwidth0.0% ( 4 reads) : Reference mapping contains non-canonical bases (transcriptome reference cannot contain U bases) 0.0% ( 2 reads) : Too much raw signal for mapped sequence
0.0% ( 1 reads) : Read failed sequence-based signal re-scaling parameter estimation.
[08:24:27] Saving Tombo reads index to file.
#################tomboDetect_modifications#################################### [09:32:28] Parsing Tombo index file(s). [09:32:48] Performing alternative model testing. [09:32:48] Performing specific alternate base(s) testing. [09:32:48] Calculating read coverage regions. [09:32:48] Calculating read coverage. [09:32:58] Performing modified base detection across genomic regions. 100%|##########| 26716/26716 [55:53<00:00, 7.97it/s] ###########################################################################
I see in the tombo github that the depth of coverage could effect the result of modified bases detection to some extent, this effect should be minimal above a certainly level of coverage (probably >10-15X, but this has not been verified). Is this condition generally applies to both DNA and RNA nanopore data. Because the copy numbers of different transcripts were variational in vivo, how to determine the coverage is suited for RNA modified bases analysis ?
Could you tell the modified base identify probable accuracy about Tombo different methods for the identification of non-standard bases both in DNA and RNA direct sequencing data, especially for Specific Alternate Base 5mC Detection about RNA direct sequencing data ?
Thanks lihui