Closed marqueda closed 3 years ago
Hi, I just fixed decode in msmc2. Have a look in my latest commit. It should compile via “make decode” with a reasonably recent DMD installation.
Best, Stephan
On 9 Jan 2019, at 13:08, David Marques notifications@github.com wrote:
Dear Stephan,
First of all, thank you for building and maintaining this fantastic software!
I have run into an issue with the decode output from the msmc2 directory version of decode. The same issue has already been spotted and (I believe) fixed for the msmc version of decode according to this post: stschiff/msmc#22 https://github.com/stschiff/msmc/issues/22 The issue being: the index (first column) referring to the genomic position for the estimate is not output by decode. The command / output looks as follows:
$ decode -m 0.01 -r 0.01 -t 10 -I 0,1 file.msmcin generating propagation core generating Hidden Markov Model 0.00979131 0.0554705 0.0780049 0.111317 0.184686 0.242241 0.207395 0.0941801 0.0162951 0.000618992 0.0412948 0.165461 0.146085 0.164509 0.184865 0.164316 0.0984293 0.0312861 0.00366366 9.06401e-05 0.0549403 0.32259 0.345796 0.188644 0.0664831 0.0174824 0.00355235 0.000482762 2.82522e-05 3.81416e-07 0.768997 0.0942113 0.0644215 0.0454119 0.0197571 0.00574796 0.00127025 0.000172844 9.9431e-06 1.33605e-07 ... While I believe it should look like this:
$ decode -m 0.01 -r 0.01 -t 10 -I 0,1 file.msmcin generating propagation core generating Hidden Markov Model 1234 0.00979131 0.0554705 0.0780049 0.111317 0.184686 0.242241 0.207395 1276 0.0941801 0.0162951 0.000618992 1856 0.0412948 0.165461 0.146085 0.164509 0.184865 0.164316 0.0984293 2734 0.0312861 0.00366366 9.06401e-05 4200 0.0549403 0.32259 0.345796 0.188644 0.0664831 0.0174824 0.00355235 1856 5677 0.000482762 2.82522e-05 3.81416e-07 10234 0.768997 0.0942113 0.0644215 0.0454119 0.0197571 0.00574796 0.00127025 0.000172844 9.9431e-06 1.33605e-07 ... Thank you very much for looking into this.
Best wishes, David
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/20, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmsJx-H9pP4asf4rZ-Ep93xVpOZYEks5vBdvagaJpZM4Z3WyA.
Fantastic! Thank you very much, Stephan! Best, David
Dear Stephan, The fix works great, the output now gives the base pair as the first column. However, I noticed that in the msmc guide, you write "The first row of the output is a header row specifying the lower ends of the time boundaries for each state.", but such a header line does not exist in the output of the msmc2 decode version. Should I infer the time boundaries from the normal msm2 output using the same 2 haplotypes instead or is the lack of the time boundaries a bug in the decode program? Thank you for your support! David
Hi David,
yes, sorry, that header line is missing. I’ll try to fix it ASAP. In the mean time, yes, please use the time boundaries from a default msmc2 run on the same data.
Best, Stephan
On 26 Feb 2019, at 13:40, David Marques notifications@github.com wrote:
Dear Stephan, The fix works great, the output now gives the base pair as the first column. However, I noticed that in the msmc guide, you write "The first row of the output is a header row specifying the lower ends of the time boundaries for each state.", but such a header line does not exist in the output of the msmc2 decode version. Should I infer the time boundaries from the normal msm2 output using the same 2 haplotypes instead or is the lack of the time boundaries a bug in the decode program? Thank you for your support! David
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/20#issuecomment-467423024, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmqm2-Fhd2p5Mi8z1tae50kzivZVsks5vRStBgaJpZM4Z3WyA.
Dear Stephan,
Thanks for the hint regarding time boundaries.
In the meantime, I have been running into another issue with decode, when using the -l
option to provide the lambda Vector together with the first two haplotypes (-I 0,1
). Instead of posterior probabilities, decode then outputs '-nan' for each HMM state. This is not true when using other haplotype indices (e.g. -I 1,2
or -I 0,2
). I am not sure how to interpret this error and I hope it will be repeatable for you, but I suspect it is somehow related to the index -I 0,1
.
Thanks for your feedback,
David
Hi David,
weird. Could you send me an input file and a command line that reproduces the error?
Thanks, Stephan
On 25 Mar 2019, at 16:47, David Marques notifications@github.com wrote:
Dear Stephan, Thanks for the hint regarding time boundaries. In the meantime, I have been running into another issue with decode, when using the -l option to provide the lambda Vector together with the first two haplotypes (-I 0,1). Instead of posterior probabilities, decode then outputs '-nan' for each HMM state. This is not true when using other haplotype indices (e.g. -I 1,2 or -I 0,2). I am not sure how to interpret this error and I hope it will be repeatable for you, but I suspect it is somehow related to the index -I 0,1. Thanks for your feedback, David
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/20#issuecomment-476256751, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmpX_4_h0jCXl6R32TBfFKHZYPU2zks5vaO-cgaJpZM4Z3WyA.
Dear Stephan,
I unfortunately run again into this -nan error. I am sending you the input files by e-Mail. The command line I used was:
decode -m 0.000724432 -r 0.000579546 -l 1.38032,1.38032,151.08,389.167,1848.6,4988.51,7455.75,6240.31,3468.46,1752.54,1019.81,733.919,636.106,628.715,673.471,752.441,853.27,961.419,1052.62,1083.8,996.131,768.536,491.836,289.651,187.156,153.387,173.615,543.681,543.681,4.71324,4.71324,4.71324 -I 0,1 ALLU1.chr1.phased.msmcin
With the long string of numbers being the comma-separated lambda values from the msmc2 outfile (which I am also sending you by e-Mail).
Thank you for having a look at this error, I hope you can replicate it with my data.
Best wishes,
David
OK, I can reproduce the error with the input files you’ve sent me. I’ll try to figure it out.
Best, Stephan
On 10 Apr 2019, at 16:38, David Marques notifications@github.com wrote:
Dear Stephan, I unfortunately run again into this -nan error. I am sending you the input files by e-Mail. The command line I used was: decode -m 0.000724432 -r 0.000579546 -l 1.38032,1.38032,151.08,389.167,1848.6,4988.51,7455.75,6240.31,3468.46,1752.54,1019.81,733.919,636.106,628.715,673.471,752.441,853.27,961.419,1052.62,1083.8,996.131,768.536,491.836,289.651,187.156,153.387,173.615,543.681,543.681,4.71324,4.71324,4.71324 -I 0,1 ALLU1.chr1.phased.msmcin With the long string of numbers being the comma-separated lambda values from the msmc2 outfile (which I am also sending you by e-Mail). Thank you for having a look at this error, I hope you can replicate it with my data. Best wishes, David
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/20#issuecomment-481718473, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmgaqtiIGtSMVkT7MSkTTT7TqMOb9ks5vffdlgaJpZM4Z3WyA.
Awesome, thank you very much, Stephan!
Sorry this has literally taken me two full years, but it's clear the issue was caused by a wrong length of the input lambda vector. I've now improved the error message. If you try the command above again you will get
$ build/decode -m 0.000724432 -r 0.000579546 -l 1.38032,1.38032,151.08,389.167,1848.6,4988.51,7455.75,6240.31,3468.46,1752.54,1019.81,733.919,636.106,628.715,673.471,752.441,853.27,961.419,1052.62,1083.8,996.131,768.536,491.836,289.651,187.156,153.387,173.615,543.681,543.681,4.71324,4.71324,4.71324 -I 0,1 ALLU1.chr1.phased.msmcin | head
loaded lambdaVec with length 32
Lambda Vector is expected to have length 64
Enforcement failed
If you install the latest version, this should be fixed.
Dear Stephan,
First of all, thank you for building and maintaining this fantastic software!
I have run into an issue with the decode output from the msmc2 directory version of decode. The same issue has already been spotted and (I believe) fixed for the msmc version of decode according to this post: https://github.com/stschiff/msmc/issues/22
The issue being: the index (first column) referring to the genomic position for the estimate is not output by decode. The command / output looks as follows:
While I believe it should look like this:
Thank you very much for looking into this.
Best wishes, David