Closed hchetia closed 3 years ago
Hi,
Would you be willing to send your data to me?
Yes. Do you want the symmetric cpg meth file?
Yes, that would be great!
I will email it to you.
My email is terence@usc.edu.
Hi Terence, I have sent the email. Thanks.
On Fri, Jul 9, 2021 at 10:28 PM Terence Li @.***> wrote:
My email is @.***
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smithlabcode/methpipe/issues/186#issuecomment-877541587, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNCALYCPNUWW7IG666DAK3TW6V4HANCNFSM5ABKF7EA .
Hi, I have not received it yet. Could you try resending it?
Just did! 👍
@terencewtli Hi Terence, I was wondering if you received the file and have any insights on why I might be facing this error?
I'm sorry I forgot to respond! I have not received it yet... could you try resending it? Or maybe posting the file here?
Hi terence, I sent it twice. Could you please check junk or spam? I sent is using academic email ID though. I would rather send the file to you personally than post it here. If possible, could you share a gmail ID?
I see... I still do not see it in my inbox. Here's another email: terenceli68@gmail.com .
Sent.
Hi,
I checked out your file and it looks like none of the CpGs have any coverage (taken from our levels
program):
head test.levels
cytosines:
total_sites: 5713196
sites_covered: 0
total_c: 0
total_t: 0
max_depth: 0
mutations: 3
called_meth: 0
called_unmeth: 0
mean_agg: 0
Likewise, the verbose output from hmr
gives the following:
[reading methylation levels]
[total_cpgs=5713196]
[mean_coverage=0]
[separating by cpg desert]
[cpgs retained: 0]
[deserts removed: 18446744073709551615]
[genome fraction covered: -nan]
gsl: gamma.c:1249: ERROR: x too large to extract fraction part
Default GSL error handler invoked.
Aborted
I believe the error has to do with the vector of CpGs being empty since none of them have coverage. Did something go wrong when running methcounts
and symmetric-cpgs
?
Hi Terence, Thanks for spotting that out. Will check the outputs from the previous step and update.
Hi @hchetia , any updates on the output from the previous step?
Hi @terencewtli I think that the file I shared must have been a bad sample. Sharing another files levels output here- looks weird still (only 9 Cs?) cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108
I see, that is strange. I'm happy to help, you can open another issue and we can look at it there. Closing this for now as I believe that this error only occurs on spurious input.
Hi @terencewtli running hmr on this dataset keeps running without generating any output or even log of any sort. Apparently the program has been using my computational resources for ~5 hrs. You have seen the levels report of this particular sample above. This is a mouse BSseq sample aligned with mm10 BS genome. Out of the complete methylation call file (7.7 gigs), CHH context was 5.8Gigs while CPG was 322 Mb; symmetric CpGs (161 mb). I am really at my wits end here. I have also used bismark_methylation_extractor for the same and it ran fine. But I also wanted to try methpipe and compare both results. Please do not close this issue yet, as the hmr program didn't generate results from the second set of symmetric CpGs which were not spurious (!). Also, let me know if I can email you all the sequential outputs from methpipe for this sample? Thanks. Very best, Hasna
@hchetia did you say you aligned the mouse sample with a bisulfite-converted genome (3 letters) ? I think that is the reason that there are so few covered sites. In methcounts and bsrate, we want users to use the original genome fasta file.
Hi Terence, I was following your manual where it states that
So going back to using Bismark for alignment, it always converts the normal genome fasta file to a Bisulfite Genome folder with CT and GA conversion subfolders with bt2 indexes. This folder is then used as a reference to alignment.
Unless you are suggesting that there is a way to use Bismark directly with a genome fasta file? Perhaps you could guide me on how to do that?
I think bismark produces a Bisulfite_Genome
folder that contains the converted genome fasta files. Our pipeline only requires the original genome fasta. In terms of the part of the manual you pasted here, we require the format_reads
because there are differences in the SAM formatting from each of the mentioned mappers. You can try the following command to convert bismark SAM output to the proper format for methpipe
:
format_reads -o <sample_name>_f.sam -f bismark <sample>.bam
For the methcounts
and bsrate
programs, try using the original hg38.fa
instead of the converted genome fasta. Does this make sense?
Hi @terencewtli I did use the original mm10.fa and not any converted Bisulfite fasta file. However, I did convert use samtools to convert bismark.bam to sam and then use it with format reads using the same command that you suggested above. Could that be the culprit?
What are your thoughts on this (see below)? Should I kill the job?
Hi @terencewtli running hmr on this dataset keeps running without generating any output or even log of any sort. Apparently the program has been using my computational resources for ~5 hrs. You have seen the levels report of this particular sample above. This is a mouse BSseq sample aligned with mm10 BS genome. Out of the complete methylation call file (7.7 gigs), CHH context was 5.8Gigs while CPG was 322 Mb; symmetric CpGs (161 mb).
I see, I don't think that should be the issue. Once again, can you verify the coverage and methylation levels of your input using our levels
program?
For your second point, yes, you should kill the job. Our HMR program shouldn't take that long to run. Can you try running the program with the verbose flag -v
and seeing what happens?
I will update you today.
cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 8.84159e-07 mean_depth: 8.84159e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 chh: total_sites: 248896199 sites_covered: 133 total_c: 1 total_t: 132 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 132 mean_agg: 1 coverage: 133 sites_covered_fraction: 5.34359e-07 mean_depth: 5.34359e-07 mean_depth_covered: 1 mean_meth: 0.007519 mean_meth_weighted: 0.007519 fractional_meth: 0.007519 ccg: total_sites: 3634011 sites_covered: 1 total_c: 0 total_t: 1 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 1 mean_agg: 0 coverage: 1 sites_covered_fraction: 2.75178e-07 mean_depth: 2.75178e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000 cxg: total_sites: 66798404 sites_covered: 32 total_c: 0 total_t: 32 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 32 mean_agg: 0 coverage: 32 sites_covered_fraction: 4.79053e-07 mean_depth: 4.79053e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000
Sharing Cytosine levels from other datasets too.
cytosines: total_sites: 749796575 sites_covered: 338 total_c: 8 total_t: 330 max_depth: 1 mutations: 1 called_meth: 8 called_unmeth: 330 mean_agg: 8 coverage: 338 sites_covered_fraction: 4.50789e-07 mean_depth: 4.50789e-07 mean_depth_covered: 1 mean_meth: 0.023669 mean_meth_weighted: 0.023669 fractional_meth: 0.023669
total_sites: 200174607 sites_covered: 106 total_c: 6 total_t: 100 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 100 mean_agg: 6 coverage: 106 sites_covered_fraction: 5.29538e-07 mean_depth: 5.29538e-07 mean_depth_covered: 1 mean_meth: 0.056604 mean_meth_weighted: 0.056604 fractional_meth: 0.056604
total_sites: 124918079 sites_covered: 87 total_c: 1 total_t: 86 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 86 mean_agg: 1 coverage: 87 sites_covered_fraction: 6.96456e-07 mean_depth: 6.96456e-07 mean_depth_covered: 1 mean_meth: 0.011494 mean_meth_weighted: 0.011494 fractional_meth: 0.011494
The Cytosine sites covered are ranging from 87-338 in these samples- which is very few right? @terencewtli @andrewdavidsmith
Right, the coverage is very, very low. I still believe that something went wrong when generating the methcounts file. Could you show the statistics from the bismark methylation statistics?
Probably not here though
cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 8.84159e-07 mean_depth: 8.84159e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 chh: total_sites: 248896199 sites_covered: 133 total_c: 1 total_t: 132 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 132 mean_agg: 1 coverage: 133 sites_covered_fraction: 5.34359e-07 mean_depth: 5.34359e-07 mean_depth_covered: 1 mean_meth: 0.007519 mean_meth_weighted: 0.007519 fractional_meth: 0.007519 ccg: total_sites: 3634011 sites_covered: 1 total_c: 0 total_t: 1 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 1 mean_agg: 0 coverage: 1 sites_covered_fraction: 2.75178e-07 mean_depth: 2.75178e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000 cxg: total_sites: 66798404 sites_covered: 32 total_c: 0 total_t: 32 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 32 mean_agg: 0 coverage: 32 sites_covered_fraction: 4.79053e-07 mean_depth: 4.79053e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000
For this sample, bismark meth stats is as follows-
Total number of C's analysed: 1803434171
Total methylated C's in CpG context: 53739952 Total methylated C's in CHG context: 2092466 Total methylated C's in CHH context: 6926063
Total C to T conversions in CpG context: 18225296 Total C to T conversions in CHG context: 406887572 Total C to T conversions in CHH context: 1315562822
C methylated in CpG context: 74.7% C methylated in CHG context: 0.5% C methylated in CHH context: 0.5%
@hchetia Someone will reach out to you about this. Recent posts in this issue have not been about the topic of the issue, and these also are not things we can tie to any aspect of the software. I am closing this issue. Again, we will help you directly with potentially confusing results. If you notice anything that suggests a particular part of the methpipe software is not working, please create a new issue.
Okay! We were trying to reach to the root of this issue which might have stemmed from conversion to methpipe format. My email is hchetia@rockefeller.edu Thanks.
Hi I am using sorted_symmetric_cpg.meth (xyz.meth)files to find HMRs in my dataset using the foll. command: hmr -p xyz.hmr.params -o xyz.hmr xyz.meth
But I get the foll. error-
gsl: gamma.c:1249: ERROR: x too large to extract fraction part Default GSL error handler invoked.
Best, Hasna