nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Generate mod_mapping with both options #307

Closed Yijun-Tian closed 2 years ago

Yijun-Tian commented 2 years ago

Hi Marcus, In megalodon mod_mapping output, is there a easy way to generate BAM with and without bisulfite emulative styles at one run? Or can we start from some intermediate data to generate one from the other? Thanks, Yijun

marcus1487 commented 2 years ago

There is no way to get both outputs from one run. In theory one could use the modbam output to create the bisulfite emulation output, but there is no implementation of this of which I am aware. The bisulfite emulation was intended as a stopgap until alternative specs/visualizations were made available. Now that the modbam spec is released and genome browsers generally support this format, the bisulfite emulation output should be considered deprecated.

Yijun-Tian commented 2 years ago

Thanks for the clarification Marcus. The emulation BAM is a very insightful and useful option. In addition to visualization, the emulative BAM could also be easily adapted to other bisulfite sequencing pipelines for further analysis. Will it be easy to make a megalodon extra script to generate emulative BAM from the standard mod_mapping output?

marcus1487 commented 2 years ago

Unfortunately megalodon is in the process of being deprecated and not under active development. I appreciate that using existing bisulfite pipelines may be attractive, but I would consider this outside the scope of this project certainly. We are moving in the direction of storing modified bases in the modified base BAM tag format and hoping that future tools will be developed against this standard. This tag has the benefit of working for any modified base while bisulfite emulation would only work for 5mC data and thus serving a limited (though certainly important) use case.