gphocs-dev / G-PhoCS

G-PhoCS is a software package for inferring ancestral population sizes, population divergence times, and migration rates from individual genome sequences.
33 stars 4 forks source link

Segmentation fault (core dumped) #50

Open zapataf opened 6 years ago

zapataf commented 6 years ago

I got a seg fault that I had not gotten before. My dataset is loci by loci and it used to run well before (on a smaller analysis). Here is a copy of my input gphocs file and control file. Thanks for any help! plates1_2_090_min70_NoLowCov_NoOG.gphocs.gz sample-control-file.ctl.txt

gphocs-dev commented 6 years ago

We will examine this issue. The bug seems to be triggered by the fact that you are analyzing a large number of diploid individuals. Integrating over all possible phases for these genomes could create memory issues. When I reduce the number of individuals per population to 5 the program seems to run fine. Note that G-PhoCS's model does not gain a significant advantage from adding many more individuals. Five per population should be more than enough to obtain reliable parameter estimates. I suggest to run several analyses with different sets of individuals for validation.

zapataf commented 6 years ago

Ah! I was not aware of the number of taxa! thanks a lot for this suggestion. I'll try to re-start the job. And I like the idea of using different sets of individuals for validation. If you get a chance to handle more individuals, that would be great and I look forward to that patch.

gphocs-dev commented 6 years ago

Hope this works out. BTW, even after we fix the bug that causes the segmentation fault, the program will likely be very slow because it has to iterate through all phased versions of every site. With n individuals, this the number of phased versions of every site can blow up to 2^n. So it still would be advised to stick to small numbers of individuals. Clearly, this is no problem if you have phased data.