pblischak / HyDe

Hybridization detection using phylogenetic invariants
http://hybridization-detection.readthedocs.io
GNU General Public License v3.0
41 stars 15 forks source link

Segmentation fault (unknown reason) #24

Closed bmichanderson closed 2 years ago

bmichanderson commented 2 years ago

Hi, I'm running into a Segmentation fault error on one of my datasets, and odd behaviour with the multiprocess script.

Here's my command and error:

run_hyde.py -i test3_data.txt -m test3_map.txt -n 183 -t 45 -s 18910 -o ZAC --prefix test3

Running run_hyde.py

Reading input file.......................................................................................................................................................................................Done.
Reading map file  ...............................................................................................................................................................................................Done.

Analyzing 39732 triple(s).
Segmentation fault

When I run this with run_hyde_mp.py, the thread usage ramps up for a few seconds, then drops to almost nothing, without stopping the program or printing an error; it just hangs. Maybe the individual processes are hitting the Segmentation fault error?

I'll attach my data files if that helps. Is there any way to explore what might be causing this? I am not familiar with C++ programming. test3_data.txt test3_map.txt

pblischak commented 2 years ago

Hi Ben,

I took a look at your input files and I believe the issue is that your data file and map file don't have the same number of lines (183 vs 191, respectively). The program uses the map file for indexing into the DNA matrix and for determining which individuals belong to each taxon and triplet. If there are individuals in the map that aren't in the data, then when the program goes to look up that information in the DNA matrix and it isn't there, it throws a segmentation fault. This is honestly just bad error handling on my part because the code is written to expect the input to match up this way and just crashes if it doesn't

I took a look at which individuals weren't in the data file (listed below), removed them, and then tried running the command you sent and didn't get the immediate segmentation fault like I did when I first tried it. I tried running the run_hyde_mp.py script as well and it appeared to be working fine, too. I will say, however, that I didn't run the entire analysis because there are a fairly large number of triplets and it would have taken a while to get through all of them, but it didn't immediately fail so hopefully this will fix the issue

FFA4-01-344844  FFA4
GRA1-01_R-344838    GRA1
NOV3-03-344856  NOV3
PET1-01_R-344839    PET1
PPL2-03-344792  PPL2
SIM2-03_R-344841    SIM2
STE2-03-344808  STE2
TRI3-04_R-344933    TRI3

Here's a copy of the new map file that I used with the individuals removed. Please let me know if removing these individuals works for you -- thanks!

test3_map2.txt

bmichanderson commented 2 years ago

Hi Paul, Sorry about that; my bad on the file creation. Thanks for digging into that error and responding so quickly! I'm currently running the data again with the corrected map file and so far so good. I'll let you know if I hit another error, but I think it is safe to close this. Thanks again!