Closed chinmaysharmacs10 closed 4 months ago
Hi @chinmaysharmacs10,
We depend on modbam2bed to be able to extract methylation calls from bam files. Since this project been deprecated, we now recommend the use of modkit, which is also developed by ONT and should be up-to-date with Dorado.
Hi @marcpaga,
Thank you for your reply.
Modkit will convert the bam files we get from Dorado to bed files. But I think bed files don't work in the live mode directly.
I did try using bed files with live mode. However, I suppose that live mode only looks for bam files in the input folder and returns the message "Looking for new bam files, so far found 0" if there is no bam file.
I used the bed example files in the demo folder, which is used for the predict mode example, to run live mode. I issued the following command:
sturgeon live \ -i demo/bed \ -o demo/bed/results_live/ \ -s guppy \ --model-files ./sturgeon/include/models/general.zip \ --probes-file ./venv/lib/python3.9/site-packages/sturgeon/include/static/probes_chm13v2.bed \ --plot-results
Let me know if you have any inputs on this, or on steps to use bed files in the live mode.
Thanks!
I understand your problem now. It's a bit complicated to keep the live
feature in the future, since we depend that for every new bam file modkit
is called to process it.
My recommendation would be that you write yourself a script that checks a folder for bam files and then calls modkit extract
(see readme for a bit more detail), then calls sturgeon inputtobed -s modkit
, and finally calls sturgeon predict
.
We are currently using this approach, and will likely leave the live
feature as legacy only for megalodon and guppy.
Thank you for your suggestions @marcpaga :) This is exactly the approach I was thinking.
Hi @marcpaga,
Hope you are doing well.
I created the script like you suggested and also managed to get some pod5 files generated by the Minion device. These files are of brain tumor samples. However, I am getting an empty bed file after running the inputtobed command.
These are the steps in my pipeline:
I am unable to understand why I am getting an empty bed file. The pod5 files have been tested for methylation presence by others and have methylated CpG sites.
Maybe I am issuing incorrect commands. I would really appreciate your help in resolving this.
Thanks, Chinmay
Hi @chinmaysharmacs10,
From your dorado command my guess is that the data is not mapped. The reads have to be mapped so that we know which CpG sites are which. Check the alignment section https://github.com/nanoporetech/dorado?tab=readme-ov-file#alignment for commands on how to align your data.
Also very imporant, align the data to the T2T reference genome for best results: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
If this does not solve the issue, could you please paste perhaps the top 20 rows of the output of modkit, maybe that can help me see what would be the problem.
Thank you for pointing that out Marc. I missed aligning my bam files.
Yes, now with the data aligned to the T2T reference genome, I am able to get good CpG site coverage.
Appreciate your help, and will reach out if I have more questions :) Your model has me really excited and I wish to leverage in this end-to-end pipeline.
Oxford Nanopore Technologies (ONT) have integrated their new basecaller Dorado into MinKNOW (the controller software on thier devices, including MinION). I noticed that we must provide source (-s) as input while running the live mode of Sturgeon. However, currently we only have Guppy/Megalodon as options.
To keep the model up-to-date with the recent developments with ONT, it would be great if Dorado is added as a source in the live mode.
Given that Dorado also outputs bam files and the underlying service architecture is very similar to Guppy, the code for adding Dorado as source should be fairly straightforward.