Closed OceaneMion closed 2 months ago
Hello @OceaneMion,
What I would do is first generate the bedMethyl with modkit pileup ${modBAM} ${pileup_bed} --motif CG 0 --ref ${ref}
. This file has a lot of information in it and I like to use them as another "core dataset" for analysis, like the modBAM, and I'll derive datasets from this one. Next, convert the bedMethyl into a bedGraph for each modification code:
# in this case making the 'm' bedgraph, but you might want to make one for `h` or `a` as well
$ awk -v OFS="\t" '($4=="m"){$1=$1;print $1, $2, $3, $11}' pileup.bed > m_pileup.bedgraph
Then convert the bedgraph to bigWig with kenttools (linux folder here) using bedGraphToBigWig:
$ bedGraphToBigWig m_pileup.bedgraph ${reference_sizes} m_pileup.bw
The m_pileup.bw
bigWig file will allow you to visualize exactly what you're looking for in a genome browser such as IGV or JBrowse.
However, if you want to "fill the gaps" in a bedMethyl file and create a bedgraph with all the positions here is a shell script that will do that.
You will need:
${modbam}
: path to the modBAM of reads with an associated index${ref}
: reference FASTA file${ref_sizes}: a tab-separated table of the contig sizes (see
bedtools complement` for specifics).#! /bin/bash
set -eu
${modkit} pileup ${modbam} pileup.bed --motif CG 0 --ref ./CGI_ladder_3.6kb_ref.fa
bedtools complement -i ./pileup.bed -g ${ref_sizes} | awk -v OFS="\t" '{$1=$1; print $1, $2, $3, ".", 0, 0.0}' > pileup_complement.bed
awk -v OFS="\t" '{$1=$1; print $1, $2, $3, $4, $5, $11}' ./pileup.bed > tmp_pileup.bed
cat ./pileup_complement.bed ./tmp_pileup.bed | bedtools sort > pileup_complemented.bed
I'd highly recommend just going the genome browser route.
Hi thanks for your awnser, I wanted a bash script because I will need to correlate the methylation frequency with transposable elements positions. The code is just what I needed thanks ! But instead I would I've prefer to have all the position and not a span for example here is the output I have
contig_1 0 2 . 0 0 contig_1 2 3 h 213 2.82 contig_1 2 3 m 213 93.43 contig_1 3 4 h 198 23.74 contig_1 3 4 m 198 72.22 contig_1 4 5 h 233 3.43 contig_1 4 5 m 233 92.27 contig_1 5 6 h 200 24.50 contig_1 5 6 m 200 73.50 contig_1 6 47 . 0 0
you see that for the first line I have from 0 to 2 but I would have prefered to have 0 to 1 then 1 to 2, do you know how I can achieve that ? Again thanks for you help.
Nevermind I manage to found how to do it ! Again thanks a lot for your help !
Awesome. Feel free to re-open this issue if you have any more questions.
Hi again, I would like to know, in modkit pileup output file, as well as in bedgraph, not all genomic position are represented For example : contig_103 13 14 0.18691589 107 contig_103 19 20 0.08163265 98 contig_103 37 38 0.10377359 106 contig_103 49 50 0.21238938 113
I know that only CpG modification are represented in my case, but what if I want to plot the probability distribution along my contigs, will I need to specify the lenght and add the nucleotide that are not C myself ? How will you plot the probability for a specific contig here ? Do you have some example code ? Thanks in advance for your awnsers