matsengrp / cft

Clonal family tree
5 stars 3 forks source link

minadcl crash bug? #224

Closed metasoarous closed 6 years ago

metasoarous commented 6 years ago

There's something weird going on where occasionally we get clusters where dnaml seems to run forever. I eventually have to kill the build, remove some files and rebuild. I'm now thinking this may be because of failed minadcl runs. rppr min_adcl gets called from within the prune.py script, and if it fails, will not output any ids. However, the program spits out ids to be removed from the tree, while the parent python script uses this to output a list of sequences to keep. So if minadcl fails, it will lead to no pruning being done and really large trees choking up dnaml. I have a feeling this is randomly happening because of memory issues when lots of other things are running. It does not however seem to only happen to the largest clusters, so I'm not entirely sure of this hypothesis.

Assuming my analysis is correct here, the right thing to do is probably to try to catch failing rppr min_adcl processes in the parent python script, and exit with a nonzero exit status so that scons sees that the target needs to get rebuilt. If it is a memory issue, it may also mean trying to sort that out by telling srun how much memory each node needs when it does its business. We're having to start doing this for the step that computes the actual clusters from minadcl's centroids.

metasoarous commented 6 years ago

I think this was finally fixed as of 41fe5b3.