smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
67 stars 27 forks source link

Error running hmr program on symmetric cpgs #186

Closed hchetia closed 3 years ago

hchetia commented 3 years ago

Hi I am using sorted_symmetric_cpg.meth (xyz.meth)files to find HMRs in my dataset using the foll. command: hmr -p xyz.hmr.params -o xyz.hmr xyz.meth

But I get the foll. error-

gsl: gamma.c:1249: ERROR: x too large to extract fraction part Default GSL error handler invoked.

Best, Hasna

terencewtli commented 3 years ago

Hi,

Would you be willing to send your data to me?

hchetia commented 3 years ago

Yes. Do you want the symmetric cpg meth file?

terencewtli commented 3 years ago

Yes, that would be great!

hchetia commented 3 years ago

I will email it to you.

terencewtli commented 3 years ago

My email is terence@usc.edu.

hchetia commented 3 years ago

Hi Terence, I have sent the email. Thanks.

On Fri, Jul 9, 2021 at 10:28 PM Terence Li @.***> wrote:

My email is @.***

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smithlabcode/methpipe/issues/186#issuecomment-877541587, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNCALYCPNUWW7IG666DAK3TW6V4HANCNFSM5ABKF7EA .

terencewtli commented 3 years ago

Hi, I have not received it yet. Could you try resending it?

hchetia commented 3 years ago

Just did! 👍

hchetia commented 3 years ago

@terencewtli Hi Terence, I was wondering if you received the file and have any insights on why I might be facing this error?

terencewtli commented 3 years ago

I'm sorry I forgot to respond! I have not received it yet... could you try resending it? Or maybe posting the file here?

hchetia commented 3 years ago

Hi terence, I sent it twice. Could you please check junk or spam? I sent is using academic email ID though. I would rather send the file to you personally than post it here. If possible, could you share a gmail ID?

terencewtli commented 3 years ago

I see... I still do not see it in my inbox. Here's another email: terenceli68@gmail.com .

hchetia commented 3 years ago

Sent.

terencewtli commented 3 years ago

Hi,

I checked out your file and it looks like none of the CpGs have any coverage (taken from our levels program):

head test.levels
cytosines:
  total_sites: 5713196
  sites_covered: 0
  total_c: 0
  total_t: 0
  max_depth: 0
  mutations: 3
  called_meth: 0
  called_unmeth: 0
  mean_agg: 0

Likewise, the verbose output from hmr gives the following:

  [reading methylation levels]
[total_cpgs=5713196]
[mean_coverage=0]
[separating by cpg desert]
[cpgs retained: 0]
[deserts removed: 18446744073709551615]
[genome fraction covered: -nan]
gsl: gamma.c:1249: ERROR: x too large to extract fraction part
Default GSL error handler invoked.
Aborted

I believe the error has to do with the vector of CpGs being empty since none of them have coverage. Did something go wrong when running methcounts and symmetric-cpgs?

hchetia commented 3 years ago

Hi Terence, Thanks for spotting that out. Will check the outputs from the previous step and update.

terencewtli commented 3 years ago

Hi @hchetia , any updates on the output from the previous step?

hchetia commented 3 years ago

Hi @terencewtli I think that the file I shared must have been a bad sample. Sharing another files levels output here- looks weird still (only 9 Cs?) cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108

terencewtli commented 3 years ago

I see, that is strange. I'm happy to help, you can open another issue and we can look at it there. Closing this for now as I believe that this error only occurs on spurious input.

hchetia commented 3 years ago

Hi @terencewtli running hmr on this dataset keeps running without generating any output or even log of any sort. Apparently the program has been using my computational resources for ~5 hrs. image You have seen the levels report of this particular sample above. This is a mouse BSseq sample aligned with mm10 BS genome. Out of the complete methylation call file (7.7 gigs), CHH context was 5.8Gigs while CPG was 322 Mb; symmetric CpGs (161 mb). I am really at my wits end here. I have also used bismark_methylation_extractor for the same and it ran fine. But I also wanted to try methpipe and compare both results. Please do not close this issue yet, as the hmr program didn't generate results from the second set of symmetric CpGs which were not spurious (!). Also, let me know if I can email you all the sequential outputs from methpipe for this sample? Thanks. Very best, Hasna

terencewtli commented 3 years ago

@hchetia did you say you aligned the mouse sample with a bisulfite-converted genome (3 letters) ? I think that is the reason that there are so few covered sites. In methcounts and bsrate, we want users to use the original genome fasta file.

hchetia commented 3 years ago

Hi Terence, I was following your manual where it states that image

So going back to using Bismark for alignment, it always converts the normal genome fasta file to a Bisulfite Genome folder with CT and GA conversion subfolders with bt2 indexes. This folder is then used as a reference to alignment.

Unless you are suggesting that there is a way to use Bismark directly with a genome fasta file? Perhaps you could guide me on how to do that?

terencewtli commented 3 years ago

I think bismark produces a Bisulfite_Genome folder that contains the converted genome fasta files. Our pipeline only requires the original genome fasta. In terms of the part of the manual you pasted here, we require the format_reads because there are differences in the SAM formatting from each of the mentioned mappers. You can try the following command to convert bismark SAM output to the proper format for methpipe:

format_reads -o <sample_name>_f.sam -f bismark <sample>.bam

For the methcounts and bsrate programs, try using the original hg38.fa instead of the converted genome fasta. Does this make sense?

hchetia commented 3 years ago

Hi @terencewtli I did use the original mm10.fa and not any converted Bisulfite fasta file. However, I did convert use samtools to convert bismark.bam to sam and then use it with format reads using the same command that you suggested above. Could that be the culprit?

What are your thoughts on this (see below)? Should I kill the job?

Hi @terencewtli running hmr on this dataset keeps running without generating any output or even log of any sort. Apparently the program has been using my computational resources for ~5 hrs. image You have seen the levels report of this particular sample above. This is a mouse BSseq sample aligned with mm10 BS genome. Out of the complete methylation call file (7.7 gigs), CHH context was 5.8Gigs while CPG was 322 Mb; symmetric CpGs (161 mb).

terencewtli commented 3 years ago

I see, I don't think that should be the issue. Once again, can you verify the coverage and methylation levels of your input using our levels program?

For your second point, yes, you should kill the job. Our HMR program shouldn't take that long to run. Can you try running the program with the verbose flag -v and seeing what happens?

hchetia commented 3 years ago

I will update you today.

hchetia commented 3 years ago

cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 8.84159e-07 mean_depth: 8.84159e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 chh: total_sites: 248896199 sites_covered: 133 total_c: 1 total_t: 132 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 132 mean_agg: 1 coverage: 133 sites_covered_fraction: 5.34359e-07 mean_depth: 5.34359e-07 mean_depth_covered: 1 mean_meth: 0.007519 mean_meth_weighted: 0.007519 fractional_meth: 0.007519 ccg: total_sites: 3634011 sites_covered: 1 total_c: 0 total_t: 1 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 1 mean_agg: 0 coverage: 1 sites_covered_fraction: 2.75178e-07 mean_depth: 2.75178e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000 cxg: total_sites: 66798404 sites_covered: 32 total_c: 0 total_t: 32 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 32 mean_agg: 0 coverage: 32 sites_covered_fraction: 4.79053e-07 mean_depth: 4.79053e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000

hchetia commented 3 years ago

Sharing Cytosine levels from other datasets too.

cytosines: total_sites: 749796575 sites_covered: 338 total_c: 8 total_t: 330 max_depth: 1 mutations: 1 called_meth: 8 called_unmeth: 330 mean_agg: 8 coverage: 338 sites_covered_fraction: 4.50789e-07 mean_depth: 4.50789e-07 mean_depth_covered: 1 mean_meth: 0.023669 mean_meth_weighted: 0.023669 fractional_meth: 0.023669


total_sites: 200174607 sites_covered: 106 total_c: 6 total_t: 100 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 100 mean_agg: 6 coverage: 106 sites_covered_fraction: 5.29538e-07 mean_depth: 5.29538e-07 mean_depth_covered: 1 mean_meth: 0.056604 mean_meth_weighted: 0.056604 fractional_meth: 0.056604


total_sites: 124918079 sites_covered: 87 total_c: 1 total_t: 86 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 86 mean_agg: 1 coverage: 87 sites_covered_fraction: 6.96456e-07 mean_depth: 6.96456e-07 mean_depth_covered: 1 mean_meth: 0.011494 mean_meth_weighted: 0.011494 fractional_meth: 0.011494


hchetia commented 3 years ago

The Cytosine sites covered are ranging from 87-338 in these samples- which is very few right? @terencewtli @andrewdavidsmith

terencewtli commented 3 years ago

Right, the coverage is very, very low. I still believe that something went wrong when generating the methcounts file. Could you show the statistics from the bismark methylation statistics?

andrewdavidsmith commented 3 years ago

Probably not here though

hchetia commented 3 years ago

Methpipe Levels Report

cytosines: total_sites: 332900830 sites_covered: 174 total_c: 9 total_t: 165 max_depth: 1 mutations: 0 called_meth: 9 called_unmeth: 165 mean_agg: 9 coverage: 174 sites_covered_fraction: 5.22678e-07 mean_depth: 5.22678e-07 mean_depth_covered: 1 mean_meth: 0.051724 mean_meth_weighted: 0.051724 fractional_meth: 0.051724 cpg: total_sites: 13572216 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 4.4208e-07 mean_depth: 4.4208e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 cpg_symmetric: total_sites: 6786108 sites_covered: 6 total_c: 6 total_t: 0 max_depth: 1 mutations: 0 called_meth: 6 called_unmeth: 0 mean_agg: 6 coverage: 6 sites_covered_fraction: 8.84159e-07 mean_depth: 8.84159e-07 mean_depth_covered: 1 mean_meth: 1.000000 mean_meth_weighted: 1.000000 fractional_meth: 1.000000 chh: total_sites: 248896199 sites_covered: 133 total_c: 1 total_t: 132 max_depth: 1 mutations: 0 called_meth: 1 called_unmeth: 132 mean_agg: 1 coverage: 133 sites_covered_fraction: 5.34359e-07 mean_depth: 5.34359e-07 mean_depth_covered: 1 mean_meth: 0.007519 mean_meth_weighted: 0.007519 fractional_meth: 0.007519 ccg: total_sites: 3634011 sites_covered: 1 total_c: 0 total_t: 1 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 1 mean_agg: 0 coverage: 1 sites_covered_fraction: 2.75178e-07 mean_depth: 2.75178e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000 cxg: total_sites: 66798404 sites_covered: 32 total_c: 0 total_t: 32 max_depth: 1 mutations: 0 called_meth: 0 called_unmeth: 32 mean_agg: 0 coverage: 32 sites_covered_fraction: 4.79053e-07 mean_depth: 4.79053e-07 mean_depth_covered: 1 mean_meth: 0.000000 mean_meth_weighted: 0.000000 fractional_meth: 0.000000

For this sample, bismark meth stats is as follows-

Final Cytosine Methylation Report

Total number of C's analysed: 1803434171

Total methylated C's in CpG context: 53739952 Total methylated C's in CHG context: 2092466 Total methylated C's in CHH context: 6926063

Total C to T conversions in CpG context: 18225296 Total C to T conversions in CHG context: 406887572 Total C to T conversions in CHH context: 1315562822

C methylated in CpG context: 74.7% C methylated in CHG context: 0.5% C methylated in CHH context: 0.5%

andrewdavidsmith commented 3 years ago

@hchetia Someone will reach out to you about this. Recent posts in this issue have not been about the topic of the issue, and these also are not things we can tie to any aspect of the software. I am closing this issue. Again, we will help you directly with potentially confusing results. If you notice anything that suggests a particular part of the methpipe software is not working, please create a new issue.

hchetia commented 3 years ago

Okay! We were trying to reach to the root of this issue which might have stemmed from conversion to methpipe format. My email is hchetia@rockefeller.edu Thanks.