popgenmethods / SINGER

Sampling and inference of genealogies with recombination
MIT License
19 stars 4 forks source link

Assertion `ws > 0' failed. #15

Open baozg opened 1 month ago

baozg commented 1 month ago

Hi, Yun

Here is the following question. With the vcf from #14, SINGER cannot work. Do you know why?

Number of incompatibilities: 687
Number of flippings: 23
Iteration: 53
[18:50:50.015] : begin BSP
BSP avg num of states: 107.914
[18:50:50.057] : begin sampling branches
[18:50:50.058] : begin TSP
[18:50:50.062] : begin sampling points
[18:50:50.063] : begin adding
[18:50:50.065] : begin sampling recombination
[18:50:50.065] : finish
989
Number of incompatibilities: 693
Number of flippings: 23
Iteration: 54
[18:50:50.067] : begin BSP
BSP avg num of states: 109.332
[18:50:50.112] : begin sampling branches
[18:50:50.112] : begin TSP
[18:50:50.114] : begin sampling points
[18:50:50.115] : begin adding
[18:50:50.116] : begin sampling recombination
[18:50:50.116] : finish
989
Number of incompatibilities: 693
Number of flippings: 23
Iteration: 55
[18:50:50.118] : begin BSP
BSP avg num of states: 114.935
[18:50:50.166] : begin sampling branches
[18:50:50.166] : begin TSP
singer: TSP.cpp:237: void TSP::mut_emit(float, float, std::set<float>&, Node_ptr): Assertion `ws > 0' failed.
Auto-debug failed. Contact the author for help: yun_deng@berkeley.edu
YunDeng98 commented 1 month ago

is it the same vcf file you sent me earlier?

baozg commented 1 month ago

Yes, it was the same vcf. It throw this error after I change it to -vcf test

YunDeng98 commented 1 month ago

OK will get back to you soon

YunDeng98 commented 1 month ago

I think I know why: (1) you use "-start 0 -end 1e6" but it seems that the final SNP is 37053? Note that this basically means the region from 37053 - 1Mb is monomorphic and should have 0 coalescence time to explain this, so the numerical underflow crashed the program. (2) You want to make sure that 4Nem = pi, and pi here is the nucleotide diversity in this region 0-37053. I calculated myself to be 0.0268346, which is really crazily high, so a Ne of 1e4 is not gonna explain this with m=7e-9.

May I ask what this system is that has this kind of crazy high diversity with such low mutation rate?

baozg commented 1 month ago

It was from one immutiy-related region (like MHC in human) in A.thaliana. Given the selfing nature, the overall mutation rate is really low, but in dieasease resistance gene region, the diverisity could be really high. It was mixture of SVs and SNPs in that region.

So I need to recaluate the Ne based on the each window I am interested? Or I could give a large window, the this will not be so crazy (like 1Mb block)

YunDeng98 commented 1 month ago

Yeah maybe using 1Mb and then calculate a better Ne based on that, I tried it on human MHC region and it worked out pretty nicely. The SVs will unfortunately be removed unfortunately, even if they are bi-allelic. But let me know if you want to keep them, it should be just a few lines of code changes.

baozg commented 1 month ago

I already filtered out SVs from this loci. I may will try to use whole chromosome vcf and see the output.

YunDeng98 commented 1 month ago

Sounds good. In the whole genome setting it helps to cut it into a bunch of 1-2Mb pieces and run SINGER in separate for each piece. It can save you a lot of time by parallelizing them.

baozg commented 1 month ago

1 Mb block still throw the same error. I tried to increase the mutation rate to 7e-8, the erros disappreance. The mutation rate varied on the genome, do I need to estimated the recombination rate with pyrho https://github.com/popgenmethods/pyrho first? But SINGER run with long time. What's the rough estimation for 200 samples in 1Mb block?

YunDeng98 commented 1 month ago

You typically don't have to estimate recombination rate, usually SINGER can automatically adjust recombination density. In terms of mutation rate variation, I haven't implemented the support for that, but you can run with 1Mb blocks with different mutation rates. The fact that using 7e-8 works but not 7e-9 means that the condition 4Nem~pi is unmet (in the 1Mb blocks you are using), usually the error throws with over or underflow numerical issues. For example, using 7e-9 instead of 7e-8 will mean the coalescence time will expand 10 times higher, but under coalescent models, the probability to reach 10 (coalescence time units) is almost zero, and model won't work from there.

baozg commented 1 month ago

After 2 days and 23hours,it fails on this 1Mb block. I may need to revisit this region polymorphism first.

Random seed: 447465016
singer: Scaler.cpp:176: void Scaler::rescale(ARG&, double): Assertion `sorted_nodes[i]->time <= sorted_nodes[i+1]->time' failed.
YunDeng98 commented 1 month ago

Would you mind sending me your scripts for this region?

baozg commented 3 weeks ago

Sorry for the late repsonse. Here is the command I used: ~/software/SINGER/releases/singer_master -Ne 5e4 -m 7e-8 -vcf Athall.dv.filter.Chr4.snps -output Chr4_snps -start 15000000 -end 16000000 -n 100 -thin 20