Open IanCodes opened 6 months ago
Hello @IanCodes,
I would use columns 1, 2, 3, and 4. Described below:
column | name | description | type |
---|---|---|---|
1 | chrom | name of reference sequence from BAM header | str |
2 | start position | 0-based start position | int |
3 | end position | 0-based exclusive end position | int |
4 | fraction modified | Nmod / Nvalid_cov | float |
5 | Nvalid_cov | See definitions above. | int |
I added the N_valid_cov column so that you could filter out rows with low (or high) valid coverage, for example you could have 1 read that's modified and that will appear as 100% (which is true) but you may not want that single site in your analysis or visualization. In the viewers I use most often you can have additional columns for in bedGraph.
To make the data work with bedGraphToBigWig
use the following (for example):
awk -v min_cov=${min_cov} '$5>min_cov{print $1, $2, $3, $4}' ${modkit_bedgraph_file}
Hope this helps, happy to answer any additional questions.
@ArtRand thank you for your helpful answer! I usually use UCSC that has the four column specification for generating bigWig. I did see a useful thread about directly viewing bedmethyl in IGV (https://github.com/nanoporetech/modkit/issues/43), so maybe that is the best route?
The --bedgraph option outputs a file for + and - strands for each modification, is it possible (or even make sense) to have a single combined strand bedgraph file per modifcation?
Thank you.
Hello @IanCodes,
You can certainly combine them. The bedGraph format doesn't have a nice way to represent different modifications, however, so you might want to combine them separately.
cat h*.bedgraph | sort -k 1,2n |cut -f 1-4 > h_strand_combined.bedgraph
Ought to do it. The cut
is optional to remove the valid coverage column, you could also filter instead.
Thanks for all the work developing this. Would be great if this description of the column outputs were added to the documentation for bedgraph pileup here.
Hello @lkwhite,
To be honest, I'm planning on deprecating the --bedgraph
flag since you can generate the bedGraph from the bedMethyl, but having an un-documented output format is poor form regardless.
Hello,
I am new to modkit, so I may be asking silly questions. The bedgraph generated using
modkit pileup --bedgraph
generates bedgraphs with five columns. The format description at UCSC describes four columns.My understanding from another post is that currently column 4 is fraction modified, and column five is score. Is that correct?
The result is that the bedgraph files cannot be converted to bigWig using UCSC bedGraphToBigWig as there are too many columns. Would it make sense to only use columns 1, 2, 3, and 5?