stschiff / msmc

Implementation of the multiple sequential markovian coalescent
GNU General Public License v3.0
87 stars 20 forks source link

Segmentation Falt on OSX (Mac) #31

Closed swamidass closed 6 years ago

swamidass commented 6 years ago

Why am I getting a seg fault?

On this basic data file:

1   530673  530672  AAAG
1   2621645 2090972 AAAG
1   19804316    17182671    AAAG
1   22466822    2662506 GAGA
1   22915237    448415  GAGA
1   23048515    133278  GGGA
1   24445215    1396700 AAAG
1   28004741    3559526 AAAG
1   29001118    996377  GGGA
1   31071573    2070455 AAAG
1   34816438    3744865 GAGA
1   35314840    498402  AAAG
1   38538312    3223472 AAAG
1   43088121    4549809 AAAG
1   43796385    708264  GAGA
1   47664747    3868362 AGAA
1   51461120    3796373 AGAA
1   56143120    4682000 GGGA
1   63973759    7830639 GGGA
1   72298048    8324289 AAAG
1   75398992    3100944 AGAA
1   75903941    504949  GAGA
1   80413141    4509200 AAAG
1   83284762    2871621 GAGA
1   83522772    238010  AAAG
1   89044382    5521610 AGAA
1   93987208    4942826 GAGA
1   94955877    968669  AAAG
1   95645609    689732  AAAG

I get a seg fault:

> msmc/build/msmc --fixedRecombination -o t msmc.data 
read 29 SNPs from file msmc.data
estimating scaled mutation rate: 8.3153e-08
Version:             1.0.1
input files:         ["msmc.data"]
maxIterations:       20
mutationRate:        8.3153e-08
recombinationRate:   2.07882e-08
subpopLabels:        [0, 0, 0, 0]
timeSegmentPattern:  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
nrThreads:           2
nrTtotSegments:      40
verbose:             false
outFilePrefix:       t
naiveImplementation: false
hmmStrideWidth:      1000
fixedPopSize:        false
fixedRecombination:  true
initialLambdaVec:    []
directedEmissions:   false
skipAmbiguous:       false
indices:             [0, 1, 2, 3]
logging information written to t.log
loop information written to t.loop.txt
final results written to t.final.txt
[1/1] estimating total branchlengthsSegmentation fault: 11
stschiff commented 6 years ago

Out of the top of my head: Are you sure your input file correctly reflects the heterozygosity in your samples? It seems you have very sparsely distributed segregating sites, and in particular it seems that almost all sites in between these are called homozygous references (third column has very large numbers).

For example, between the second and the third site, there are >17Mb of sequence segment called homozygous reference, with no heterozygosity whatsoever...

I think you may be getting a seg-fault because of some overflow- underflow errors due to extremely long homozygous blocks…

Stephan

On 3 Jan 2018, at 22:25, swamidass notifications@github.com wrote:

Why am I getting a seg fault?

On this basic data file:

1 530673 530672 AAAG 1 2621645 2090972 AAAG 1 19804316 17182671 AAAG 1 22466822 2662506 GAGA 1 22915237 448415 GAGA 1 23048515 133278 GGGA 1 24445215 1396700 AAAG 1 28004741 3559526 AAAG 1 29001118 996377 GGGA 1 31071573 2070455 AAAG 1 34816438 3744865 GAGA 1 35314840 498402 AAAG 1 38538312 3223472 AAAG 1 43088121 4549809 AAAG 1 43796385 708264 GAGA 1 47664747 3868362 AGAA 1 51461120 3796373 AGAA 1 56143120 4682000 GGGA 1 63973759 7830639 GGGA 1 72298048 8324289 AAAG 1 75398992 3100944 AGAA 1 75903941 504949 GAGA 1 80413141 4509200 AAAG 1 83284762 2871621 GAGA 1 83522772 238010 AAAG 1 89044382 5521610 AGAA 1 93987208 4942826 GAGA 1 94955877 968669 AAAG 1 95645609 689732 AAAG I get a seg fault:

msmc/build/msmc --fixedRecombination -o t msmc.data read 29 SNPs from file msmc.data estimating scaled mutation rate: 8.3153e-08 Version: 1.0.1 input files: ["msmc.data"] maxIterations: 20 mutationRate: 8.3153e-08 recombinationRate: 2.07882e-08 subpopLabels: [0, 0, 0, 0] timeSegmentPattern: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] nrThreads: 2 nrTtotSegments: 40 verbose: false outFilePrefix: t naiveImplementation: false hmmStrideWidth: 1000 fixedPopSize: false fixedRecombination: true initialLambdaVec: [] directedEmissions: false skipAmbiguous: false indices: [0, 1, 2, 3] logging information written to t.log loop information written to t.loop.txt final results written to t.final.txt [1/1] estimating total branchlengthsSegmentation fault: 11 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc/issues/31, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmlu_SSLXwyM0uBCZKvml9ig-4VyOks5tG_A3gaJpZM4RSRzV.

swamidass commented 6 years ago

That makes sense. This was just some initial data to see if I could get it working. Sounds like I just need to change the input data.

About how long does it take to run on, say, a single Chromosome of the CG data?

stschiff commented 6 years ago

A single chromosome of CG data will probably run around an hour or so for four haplotypes.