sate-dev / sate-core

3 stars 3 forks source link

Zombie processes on some files (not sure of cause) #36

Open brantfaircloth opened 9 years ago

brantfaircloth commented 9 years ago

First time I've seen this - for some alignments, raxml 7.2.6 will enter a zombified state. I have no idea of the cause, and I can replicate this effect by using the SATé raxml binary locally with the input alignment and part.txt file from the tmp folder for a given alignment.

Strangely, running the binary version of raxml that comes with SATé, but passing the -p option (with a random seed) results in raxml processes that complete.

This does not happen for all or even most alignments - just an apparent special few.

Adding pull request momentarily w/ apparent fix.

brantfaircloth commented 9 years ago

Even after making the change, I'm still getting some zombie processes (on both local machines w/ "real" drives and on the HPC - originally thought it might be NFS-related). Not sure if above is actually fixing anything or if problem is ephemeral (seems to be the latter). Simply killing and restarting some runs lets them complete... where they hung before.

mtholder commented 9 years ago

hmm. And does sate just hang, or does it terminate and fail to clean up the raxml processes? thanks for the pull request. I'll see if I can replicate this.

brantfaircloth commented 9 years ago

it just hangs w/ raxml running at 100% but not (apparently) doing anything. re-running or using a different raxml can fix the problem w/ trees produced in 1-2 minutes for each iteration.

it's very, very weird. i'll shoot you a link via email with the problematic alignments (on re-run, these are sometimes not problematic...).