stschiff / msmc

Implementation of the multiple sequential markovian coalescent
GNU General Public License v3.0
87 stars 20 forks source link

another Segmentation Fault #35

Closed mike2vandy closed 5 years ago

mike2vandy commented 6 years ago

Hey Stephan, I've been bugging you a little bit, my data are already haploid, and had to make a script to generate an input file. However, after a few attempts I'm still stuck on the segmentation fault.

sample input data: LT635612.1 6507 6112 CCCT LT635612.1 7080 573 CACA LT635612.1 7912 832 GGAG LT635612.1 17144 9143 TTAA LT635612.1 20965 3791 CCAA LT635612.1 21774 809 TTCC LT635612.1 23125 1351 CAAC LT635612.1 32902 9464 TCCC LT635612.1 35983 3081 GTGT LT635612.1 41100 4982 AGGG

~/betterSoftware/msmc/build/msmc -o test LT* read 401 SNPs from file LT635612.1.out read 396 SNPs from file LT635613.1.out read 419 SNPs from file LT635614.1.out read 475 SNPs from file LT635615.1.out read 750 SNPs from file LT635616.1.out read 471 SNPs from file LT635617.1.out read 659 SNPs from file LT635618.1.out read 741 SNPs from file LT635619.1.out read 982 SNPs from file LT635620.1.out read 747 SNPs from file LT635621.1.out read 896 SNPs from file LT635622.1.out read 1235 SNPs from file LT635623.1.out read 917 SNPs from file LT635624.1.out read 1291 SNPs from file LT635625.1.out estimating scaled mutation rate: 9.24807e-05 Version: 1.0.1 input files: ["LT635612.1.out", "LT635613.1.out", "LT635614.1.out", "LT635615.1.out", "LT635616.1.out", "LT635617.1.out", "LT635618.1.out", "LT635619.1.out", "LT635620.1.out", "LT635621.1.out", "LT635622.1.out", "LT635623.1.out", "LT635624.1.out", "LT635625.1.out"] maxIterations: 20 mutationRate: 9.24807e-05 recombinationRate: 2.31202e-05 subpopLabels: [0, 0, 0, 0] timeSegmentPattern: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] nrThreads: 16 nrTtotSegments: 40 verbose: false outFilePrefix: test naiveImplementation: false hmmStrideWidth: 1000 fixedPopSize: false fixedRecombination: false initialLambdaVec: [] directedEmissions: false skipAmbiguous: false indices: [0, 1, 2, 3] logging information written to test.log loop information written to test.loop.txt final results written to test.final.txt [11/14] estimating total branchlengthsSegmentation fault (core dumped)

I have also been less stringent on my snp calling (I get about 36K instead of 7K) and gotten the same segmentation fault. Any thoughts? Maybe faulty input files?

Mike

stschiff commented 6 years ago

Hi, I don't think this is because of faulty input files. I think with so few SNPs there might be not enough data, and the program overfits and overflows. I have now built in an option to artificially constrain the coalescence rates, see release 1.1.0, with options --loBoundLambda and --hiBoundLambda. Lambda is the scaled coalescence rate, with typical values around 1000 or so (it's on average around 2/theta, where theta is the heterozygosity), so you could try setting a lower bound of 10, and an upper bound of 1e6 or so. If that doesn't help, please send me the input files by email and I'll have a look.