Closed sarah-ku closed 6 months ago
Here's a link to the SAM tags documentation
The definition of MM
numeric values - emphasis on to skip
... comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, ...
MM:Z:A+a?,10
- skips 10 "fundamental" A
bases - 19 appears twice out of coincidence here.
The definition of '?':
When this flag is ‘?’ there is no information about the modification status of the skipped bases provided.
Kind regards, Rich
Hello @sarah-ku,
The MM and ML tags are difficult to interpret by eye. I recommend using modkit extract
(docs) to transform the MM/ML tags into a table. @HalfPhoton is correct about the MM tags being the number of "skips" not the actual positions, the ML scores are 0-255 probability bins, so 0 is the lowest probability of modification and it's reasonable for multiple positions to have the same prediction probability.
Hope you don't mind us asking here about the output from the m6A methylation model.
We converted our DRS (rna004 kit) data from fast5 to pod5 and tested out the command:
dorado basecaller sup,m6A_DRACH data.pod5 > output.bam
here is an example of one of the entries from the resulting BAM file:
414b0961-3412-4660-a4f1-118de6d22719 4 * 0 0 * * 0 0 GGCGAGCAGGGAGGCAAAGCTCGCGCCAAGGCCAAGACCCGCTCTTCTCGGGCCGGGCTCAGTTTCCCGTGGGGGCCGAGTGCATCGCCTGCTCCGCAAAGGCAACTGCACGGCGGAGCGGGTGCTGGAGCTCCGGTGTCCCTGGCGGCGGTGCTGGAGTACCTGACCATCGAGATCCTGGAGCTGGCTGGCAACGCGGCCGCGACAACAAGAAGAATTCGTCATCATCCCCGCGCACCTCGAGCTGGCCATCCGCAACGATGAGGAGCTCAACAAGCTTCTGGGCAAGTCATACATGGTGGCGTCCTGCCCAACATCCAGGCCGTGCTACTGCCCAAGAAGACCGAGAGCCAAGGCGGGCAAGTAGAAGCCTGGATTAGTTTGCAGCAACTCAATCCCAAGGAACCAAAGGCTCAGAGCCTTGGGGTGGCCCCAGCCCCCACCCCCGCCCTACAACTTATCAGCCCATATCAACCCTGCCCCCTCCCCCTCGCCCCCTCGCCCTCTCAAAACACCCC ((((???==>;<9101869844343196=A?AB67778@+*)(*)(),/111966.(($$$%),.0)))(((+121113447889=;5;85555:=@A3=?9481&&%%%%%(-''())6=>>=>@C**)+.000//+,2;9CC98;;<?6).>JA:?87753:*))))*5////389;<567:66678;:@=@;94&%(34>99889888.%$$$$%&&&+11676(.75.'''''''''(738=73353=?CAAC99858:6<@:9?BED=7>6876769<<42'$####$%((,11123,+..07<<39<==76444//5775440231165<<:=8>6FGB884-*+**(%&(+/035666400011>>?@=>=<<;;::8//.../))-/,1((3379433-45*'&'(+66)('()2*$$#""###$$%&&%&&'*($%&&%%&%')--+++)())('**),+%%('$##$$%')&&&)'(&$###$&$##$#$%)%$###"##$$%')'%$ qs:i:9 du:f:5.71875 ns:i:22875 ts:i:0 mx:i:3 ch:i:88 st:Z:2024-02-27T06:33:59.408+00:00 rn:i:31356 fn:Z:output6.pod5 sm:f:79.4189 sd:f:19.3945 sv:Z:quantile dx:i:0 RG:Z:c884b8754e91b8b445f4f47f572cedd4a0678cca_rna004_130bps_sup@v3.0.1 MN:i:518 MM:Z:A+a?,10,13,43,19,19; ML:B:C,0,1,1,1,199
My questions are:
We are using the required rna004 kit for this model and running on converted pod5 files on a GPU server.