Open bug1303 opened 5 years ago
Hi @bug1303
Yes, that is a typo, thanks, I've fixed it and it will be corrected in the next version.
This makes no difference from Defiant's perspective
"C count" means "count unmethylated" thanks for helping to clear any possible confusion. I've altered this in the help menu.
"which is calculated by Defiant" that's too much detail, I shouldn't have put that in there. Defiant saves unsigned integers for counts, which I found was the best way to do the programming, instead of using double
or float
. The percent calculated will be the same.
How would you get to 0.265625? It's neither 17/76, nor 17/(76+17), depending on what you actually mean in no 4... (17/64=0.265625 , assuming the 64 that you mention in input type 5 example )
they were only examples, I didn't mean for them to be taken literally. However, thank you for seeing this, I've changed it.
I've attached 2 files showing how the different inputs look like
You write in the README.md that you support the following formats:
However, these don’t entirely match what is described in the bismark_methylation_extractor help:
and
1) You call both "coverage2cytosine" format. The "coverage2cytosine" Bismark module can create a "genome-wide cytosine methylation output file" (which looks ALMOST like Input Type 5) from the coverage output (which looks ALMOST like your Input Type 6), but can also be created from bismark_methylation_extractor directly.
2) In Input Type 5 example you show start and end position (and 8 columns in total), but describe below only start position and 7 columns in total. I assume it's just a typo in the example?
3) You write the start/end position for all are in [0,4294967295], Bismark by default uses 1-based, unless
--zero-based
is explicitly specified, and only then it becomes half-open. So, by default it's all 1-based and start position == end position, in your example it says '762 763', so should indeed--zero-based
be specified?4) Bismark clearly states "count methylated" and "count non-methylated" rather than "methylated C count" and "C count". "C count" sounds like total count (methylated + non-methylated). What is actually expected here?
5) Input Type 6 "Column4: methylation percentage, which is calculated by Defiant." - Why is this calculated by Defiant? And how? Shouldn't this be input to Defiant? It is part of the Bismark coverage output. However your example... "chr1 762 763 0.265625 17 76 " How would you get to 0.265625? It's neither 17/76, nor 17/(76+17), depending on what you actually mean in no 4... (17/64=0.265625 , assuming the 64 that you mention in input type 5 example ) However, from an Bismark run, I got e.g. in coverage output (test.deduplicated.bismark.cov.gz):
chr3 3008646 3008646 33.3333333333333 1 2
chr3 5620584 5620584 75 3 1
So, the methylation percentage is (100*col5/(col5+col6)) and not (col5/col6)(Also, the start and end position are same (as stated in 3), unless --zero-based is used, but then it would not be valid input to the coverage2cytosine script.)
Please consider to provide an example call for the bismark_methylation_extractor, that will produce files of the type that defiant will read and process as expected.
Looking forward to test the program once this is clarified.