Open baozg opened 5 months ago
is it the same vcf file you sent me earlier?
Yes, it was the same vcf. It throw this error after I change it to -vcf test
OK will get back to you soon
I think I know why: (1) you use "-start 0 -end 1e6" but it seems that the final SNP is 37053? Note that this basically means the region from 37053 - 1Mb is monomorphic and should have 0 coalescence time to explain this, so the numerical underflow crashed the program. (2) You want to make sure that 4Nem = pi, and pi here is the nucleotide diversity in this region 0-37053. I calculated myself to be 0.0268346, which is really crazily high, so a Ne of 1e4 is not gonna explain this with m=7e-9.
May I ask what this system is that has this kind of crazy high diversity with such low mutation rate?
It was from one immutiy-related region (like MHC in human) in A.thaliana. Given the selfing nature, the overall mutation rate is really low, but in dieasease resistance gene region, the diverisity could be really high. It was mixture of SVs and SNPs in that region.
So I need to recaluate the Ne based on the each window I am interested? Or I could give a large window, the this will not be so crazy (like 1Mb block)
Yeah maybe using 1Mb and then calculate a better Ne based on that, I tried it on human MHC region and it worked out pretty nicely. The SVs will unfortunately be removed unfortunately, even if they are bi-allelic. But let me know if you want to keep them, it should be just a few lines of code changes.
I already filtered out SVs from this loci. I may will try to use whole chromosome vcf and see the output.
Sounds good. In the whole genome setting it helps to cut it into a bunch of 1-2Mb pieces and run SINGER in separate for each piece. It can save you a lot of time by parallelizing them.
1 Mb block still throw the same error. I tried to increase the mutation rate to 7e-8, the erros disappreance. The mutation rate varied on the genome, do I need to estimated the recombination rate with pyrho https://github.com/popgenmethods/pyrho first? But SINGER run with long time. What's the rough estimation for 200 samples in 1Mb block?
You typically don't have to estimate recombination rate, usually SINGER can automatically adjust recombination density. In terms of mutation rate variation, I haven't implemented the support for that, but you can run with 1Mb blocks with different mutation rates. The fact that using 7e-8 works but not 7e-9 means that the condition 4Nem~pi is unmet (in the 1Mb blocks you are using), usually the error throws with over or underflow numerical issues. For example, using 7e-9 instead of 7e-8 will mean the coalescence time will expand 10 times higher, but under coalescent models, the probability to reach 10 (coalescence time units) is almost zero, and model won't work from there.
After 2 days and 23hours,it fails on this 1Mb block. I may need to revisit this region polymorphism first.
Random seed: 447465016
singer: Scaler.cpp:176: void Scaler::rescale(ARG&, double): Assertion `sorted_nodes[i]->time <= sorted_nodes[i+1]->time' failed.
Would you mind sending me your scripts for this region?
Sorry for the late repsonse. Here is the command I used: ~/software/SINGER/releases/singer_master -Ne 5e4 -m 7e-8 -vcf Athall.dv.filter.Chr4.snps -output Chr4_snps -start 15000000 -end 16000000 -n 100 -thin 20
Hi, Yun
Here is the following question. With the vcf from #14, SINGER cannot work. Do you know why?