wdecoster / methplotlib

Plotting tools for nanopore methylation data
MIT License
90 stars 12 forks source link

all context methylation visualization #29

Open PaulaRomeroLozano opened 3 years ago

PaulaRomeroLozano commented 3 years ago

Hello,

I am using megalodon to identify methylations in the whole genome, not only in the CpG islands. I was wondering if it is possible to use Methplotlib to visualise also those regions outside the CpG context with the megalodon outputs. Do you think it is compatible?

Thank you, Paula

wdecoster commented 3 years ago

Hi Paula,

Yes, that shouldn't be a problem. methplotlib doesn't really care about the type of modification it shows. Which file format did you get from megalodon? It is possible that I need to add an additional file format as input type, but that shouldn't be too much trouble.

Cheers, Wouter

PaulaRomeroLozano commented 3 years ago

Hi Wouter,

Thank you very much for the quick response.

That would be incredibly useful!

I am still getting familiar with all the megalodon outputs and which ones are required and "compatible" for downstream analysis.

Please find below the file formats I got when running megalodon:

  1. mod_mapping.bam: I was thinking in sort it and convert it to a bedgraph using bedtools? image 2.modified_bases.5mc.bed: (0-based) image 3.per_read_modified_bases: as a .txt and .bd file (natural log) image

I was thinking in trying to make those files "nanopolish like" so I could use your scripts, but if you could add an extra input type that would be very helpful.

Thank you so much in advance, Best,

Paula

wdecoster commented 3 years ago

Please keep me posted how things go for any of these formats. It may not be necessary to make them too similar to nanopolish files.

1) methplotlib can take cram files with nucleotide modifications in the Mm tag, so your bam file should not need too many changes. Converting to cram should be enough. That said, I could also add an option to use bam directly.

2) That looks like a bedmethyl format. You can quite easily make a bedgraph out of that, which is currently one of the compatible input formats. I could in the future also add bedmethyl input directly.

3) Could you share a larger example of that format? I'll add it as a compatible input too. You can find my email address in my profile.

I'm afraid I won't have time to add the code for these formats this week.

PaulaRomeroLozano commented 3 years ago

Thank you very much for your help and feedback.

  1. I will go with this option first then.

  2. I will send you a small subset of the data by email.

Thank you again and I will keep you posted for sure :)

JWDebler commented 3 years ago

I found an R script in the METEORE repo that takes the megalodon .bed file and computes methylation frequencies like the nanopolish calculate_methylation_frequency.py script does. In its original form the output doesn't work with methplotlib yet, but it shouldn't take much to get the output close enough to nanopolish format. I don't know enough R though to do it.

marcus1487 commented 3 years ago

I can help on some of the megalodon options here. Megalodon has a lot of outputs in order to facilitate porting into tools such as methplotlib. Megalodon is designed with two main modified base output types 1) per-read and 2) aggregated.

Per-read outputs include the mod_mappings.bam above. Megalodon can also output cram directly (via --mappings-format cram) and sort the mappings (via --sort-mappings). The mod_mappings output takes the per-read modified base calls and anchors them to the reference sequence (so these reads will artificially contain exactly the reference sequence for each read). mod_basecalls are anchored only to the basecalls and not mapped, so not directly applicable to methplotlib (I think).

The per-read modified base text format is meant as a dump of the per-read scores database mostly for debugging purposes. I would not recommend this as a standard format that should used as input and is subject to change without notice.

Aggregated formats returns results aggregated across reads at each applicable reference position. For the aggregated formats, Megalodon provides 3 options 1) bedmethyl 2) modvcf and 3) wiggle (mostly like a bedgraph but more compact). The modVCF is not really a standard format, but there is no standard format (that I'm aware of) to store multiple modifications in a single file. VCF was the closest, but required adding some fields for strandedness.

@JWDebler that METEORE script does not compute the methylation frequencies, but instead just converts from the format from bedmethyl to a consistent text format for comparison to other algorithms within the METEORE framework. I believe this could essentially be accomplished with a simple awk script to move columns around.

I'm happy to help mediate the porting of megalodon outputs into methplotlib. Let me know if any adjustments or recommendations would be helpful.

PaulaRomeroLozano commented 3 years ago

Hello Marcus,

Thank you so much for your help.

It would be super useful to be able to port megalodon outputs into methplotlib! :) Since I got mod_mapping.bam as output, could it be possible to simply convert that bam to cram? I tried: samtools sort mod_mappings.bam -o mod_mappings.sorted.bam samtools view -C -T genome.fa mod_mappings.bam > mod_mappings.cram samtools index -c mod_mappings.sorted.cram

I did so, and then I tried to run methplotlib but I got an error:

methplotlib -m /mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram -n calls -w chr17:17,722,291-17,725,186 -g /media/data/reference/Homo_sapiens.GRCh38.90.chrplus.gtf --simplify [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/a8499ca51d6fb77332c2d242923994eb": Protocol not supported Traceback (most recent call last): File "/home/paula/.local/bin/methplotlib", line 8, in sys.exit(main()) File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 26, in main meth_browser(meth_data=meth_data, File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 52, in meth_browser meth_traces = plots.methylation(meth_data, dotsize=dotsize, binary=binary) File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/plots.py", line 120, in methylation make_per_read_meth_traces_phred(table=meth.table, File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/plots.py", line 148, in make_per_read_meth_traces_phred table = table.join(df_heights, on="read_name") File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 7874, in join return self._join_compat( File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 7890, in _join_compat return merge( File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 74, in merge op = _MergeOperation( File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 656, in init self._maybe_coerce_merge_keys() File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1165, in _maybe_coerce_merge_keys raise ValueError(msg) ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

The methplotlib logs reported: 2021-04-08 14:51:00,334 methplotlib 0.17.0 started. 2021-04-08 14:51:00,334 Python version is: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] 2021-04-08 14:51:00,334 Arguments are: Namespace(bed=None, binary=False, dotsize=4, example=False, fasta=None, gtf='/media/data/reference/Homo_sapiens.GRCh38.90.chrplus.gtf', methylation=['/mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram'], names=['calls'], outfile=None, qcfile=None, simplify=True, smooth=5, split=False, static=None, store=False, window='chr17:17,722,291-17,725,186') 2021-04-08 14:51:00,334 Processing chr17_17722291_17725186 2021-04-08 14:51:00,335 File /mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram is of type ont-cram 2021-04-08 14:51:00,557 Collected methylation data for 1 datasets 2021-04-08 14:51:00,658 Created QC plots

I am not sure why is not working and how I can fix it...

Another option would be to convert the bedmethyl file into bedgraph but I haven't been able to succeed on this either.

Any feedback or recommendation would be very much appreciated.

Kind regards, Paula

marcus1487 commented 3 years ago

I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.

The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.

For converting from bedmethyl to bedgraph the following one-liner should work: awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph. You may need to add a header to load this file in some genome browsers.

Fatihlrcfs commented 2 years ago

hi all,

I am working on methylation outputs to visualize that. I have megalodon(files: modified_bases.5mc.bed, per_read_modified_base_calls.db and per_read_modified_base_calls.txt) and ı am wondering to use these file in Methplotlib. Could you say me the steps basically because ı am a freshman in this field? Many thanks. Kind Regards.

wdecoster commented 2 years ago

The megalodon input is in BAM/CRAM format, but you don't have those?

Fatihlrcfs commented 2 years ago

Hi @wdecoster
ı am using comman below to run megalodon and these (fmodified_bases.5mc.bed, per_read_modified_base_calls.db and per_read_modified_base_calls.txt) files were generated. thanks code: megalodon debarcodedmultifast5/barcode06/ --outputs mods --reference wholegenome.fasta --mod-motif m CG 0 --write-mods-text --processes 40 --guppy-server-path /cluster/lrcfs/2397405/bin/ont-guppy-cpu-6.0.1/bin/guppy_basecall_server --guppy-params "-d /cluster/lrcfs/2397405/bin/rerio/basecall_models/ --num_callers 40" --guppy-timeout 400 --guppy-config res_dna_r941_min_modbases_5mC_CpG_v001.cfg --overwrite --output-directory megalodonresultbarcode06

wdecoster commented 2 years ago

I think --outputs with mod_mappings or mod_basecalls is what you need, but I don't use megalodon myself

Fatihlrcfs commented 2 years ago

okey, @wdecoster ı will configure my code and run with a small data set to generate the required files. thanks

carolinehey commented 2 years ago

Hi @wdecoster

Earlier you wrote this:

  1. methplotlib can take cram files with nucleotide modifications in the Mm tag, so your bam file should not need too many changes. Converting to cram should be enough. That said, I could also add an option to use bam directly.
  2. That looks like a bedmethyl format. You can quite easily make a bedgraph out of that, which is currently one of the compatible input formats. I could in the future also add bedmethyl input directly.
  3. Could you share a larger example of that format? I'll add it as a compatible input too. You can find my email address in my profile.

As for 1. i have a cram file with modifications in the Mm tag and would really like to use methplotlib. However, i get an error regarding the -n argument.

code: methplotlib -m merged_bam_sup_modbases_sed.cram -w chr15:23000000-25500000 usage: methplotlib [-h] [-v] -m METHYLATION [METHYLATION ...] -n NAMES [NAMES ...] -w WINDOW [-g GTF] [-b BED] [-f FASTA] [--simplify] [--split] [--static STATIC] [--binary] [--smooth SMOOTH] [--dotsize DOTSIZE] [--minqual MINQUAL] [--example] [-o OUTFILE] [-q QCFILE] methplotlib: error: the following arguments are required: -n/--names

How to deal with this?

wdecoster commented 2 years ago

Have you tried setting the -n argument? methplotlib -m merged_bam_sup_modbases_sed.cram -w chr15:23000000-25500000 -n test

carolinehey commented 2 years ago

Sorry for my ignorance. Yes i have but get an error. Thought it might be something with wrong arguments but now i see that my error is similar to what other people have reported earlier.

methplotlib -m merged_bam_sup_modbases_sed.cram -n test -w chr15:23000000-25500000

[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?)) [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/f036bd11158407596ca6bf3581454706": Input/output error Traceback (most recent call last): File "/home/prom/anaconda3/envs/methplotlib/bin/methplotlib", line 8, in sys.exit(main()) File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 26, in main meth_browser(meth_data=meth_data, File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 60, in meth_browser fig = utils.create_subplots(num_methrows, File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/utils.py", line 250, in create_subplots return plotly.subplots.make_subplots( File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/_plotly_utils/importers.py", line 39, in getattr raise AttributeError( AttributeError: module 'plotly' has no attribute 'subplots'

2022-05-18 06:24:15,208 methplotlib 0.20.1 started. 2022-05-18 06:24:15,208 Python version is: 3.10.2 | packaged by conda-forge | (main, Jan 14 2022, 08:02:09) [GCC 9.4.0] 2022-05-18 06:24:15,208 Arguments are: Namespace(methylation=['merged_bam_sup_modbases_sed.cram'], names=['test'], window='chr15:23000000-25500000', gtf=None, bed=None, fasta=None, simplify=False, split=False, static=None, binary=False, smooth=5, dotsize=4, minqual=20, example=False, outfile=None, qcfile=None, store=False) 2022-05-18 06:24:15,208 Processing chr15_23000000_23833334 2022-05-18 06:24:15,350 File merged_bam_sup_modbases_sed.cram is of type cram 2022-05-18 06:24:22,103 Collected methylation data for 1 datasets 2022-05-18 06:24:22,628 Created QC plots 2022-05-18 06:24:36,421 Prepared methylation traces. 2022-05-18 06:24:36,421 Making browser in split mode, with 1 modification rows.

When i try to use the bam file it looks like this:

methplotlib -m merged_bam_sup_modbases_sed.bam -n test -w chr15:23000000-25500000 Traceback (most recent call last): File "/home/prom/anaconda3/envs/methplotlib/bin/methplotlib", line 8, in sys.exit(main()) File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 26, in main meth_browser(meth_data=meth_data, File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 60, in meth_browser fig = utils.create_subplots(num_methrows, File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/utils.py", line 250, in create_subplots return plotly.subplots.make_subplots( File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/_plotly_utils/importers.py", line 39, in getattr raise AttributeError( AttributeError: module 'plotly' has no attribute 'subplots'

Is my input file the problem or something else?

wdecoster commented 2 years ago

AttributeError: module 'plotly' has no attribute 'subplots' is not related to your input file. Could you check the version of your plotly module? How did you install methplotlib?

carolinehey commented 2 years ago

I used pip install methplotlib

My plotly module is 5.8.0

wdecoster commented 2 years ago

I just installed plotly 5.8.0, and it works here. Please try in an interactive python session if it works, and print the version as well, to make sure that not an older version is imported.

DiegoZavallo commented 2 years ago

Hi, I have get the same error when I run methplotlib, but it output an .html results that I don´t understand. I don´t have gff ad gtf file for this organism

image

methplotlib -m modified_bases.5mC.bedgraph -n calls -f Paspalum_notatum_genome.fasta -w CM039683.1:231000-234000


Reading modified_bases.5mC.bedgraph would be faster with bgzip and 'tabix -p bed'.
Traceback (most recent call last):
  File "/home/diego/Documents/INTA/Nanopore/env/bin/methplotlib", line 8, in <module>
    sys.exit(main())
  File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 26, in main
    meth_browser(meth_data=meth_data,
  File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 89, in meth_browser
    fig = utils.create_subplots(num_methrows, split=False, annotation=bool(bed or gtf))
  File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/utils.py", line 261, in create_subplots
    return plotly.subplots.make_subplots(
  File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/_plotly_utils/importers.py", line 39, in __getattr__
    raise AttributeError(
AttributeError: module 'plotly' has no attribute 'subplots'

My plotly version is also 5.8.0.

And I run it on an env/ in which I installed methplotlib

Do you have any clue?

YingziZhang-github commented 2 years ago

I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.

The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.

For converting from bedmethyl to bedgraph the following one-liner should work: awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph. You may need to add a header to load this file in some genome browsers.

I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.

The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.

For converting from bedmethyl to bedgraph the following one-liner should work: awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph. You may need to add a header to load this file in some genome browsers.

Hi, I want to convert my bedmethyl file to bedgraph file. I used the one-liner and got an empty output. Would you please help check my problem? Many thanks!

My bedmethyl format is like this:

chr15   17170182        17170183        5mC     1000    -       17170182        17170183        0,0,0   1       0.00    1       0       0
chr15   17170225        17170226        5mC     0       -       17170225        17170226        0,0,0   1       0.00    0       0       0
chr15   17170391        17170392        5mC     0       -       17170391        17170392        0,0,0   1       0.00    0       0       0
chr15   17170477        17170478        5mC     0       -       17170477        17170478        0,0,0   1       0.00    0       0       0
chr15   17170495        17170496        5mC     1000    -       17170495        17170496        0,0,0   1       0.00    1       0       0