nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
156 stars 20 forks source link

Annotation of data #71

Closed kewei2019 closed 1 year ago

kewei2019 commented 1 year ago

Hi, I have successfully used Remora to analyze my trained bacteria genome for m6A in GATC sites. However, I am unsure about the meaning of each lane in the output and how to determine whether a site is methylated or not.

And correct me if wrong: query_name: reads' name ref_name: chromosome name ref_pos: position on chromosome

Thanks very much! Kewei image

marcus1487 commented 1 year ago

Here is a brief description of each column in this output.

kewei2019 commented 1 year ago

Hi, more quick questions:

  1. For the mod_probs, actually it contains two floats, like: "0.341796875,0.658203125" and "0.990234375,0.009765625". So which float represent the probability at one position?

  2. For the gt_mod_idx, you said "0 is canonical, 1 is ground truth modified position", but I found a lot of positions that are "none", what does ''none" mean? image image

marcus1487 commented 1 year ago

As found in my previous answer "mod_probs: Probability of each label output by the model at this position ", these represent the probabilities of each output label. The labels are noted in the log file. For example with canonical cytosine (C), 5hmC (h) and 5mC (m) you might see the labels as Chm. This indicates that the order of the produced probabilities from the model.

None indicates that there is a modified base call, but no ground truth label for that position.