shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 33 forks source link

Bugfix for ChromLinkMatrix.cc Enrichment function. Bugfix for CountMotifsInFasta.pl. #12

Closed cooketho closed 7 years ago

cooketho commented 8 years ago

On my system (Ubuntu 12.04) Lachesis throws errors when running CountMotifsInFasta.pl. I traced this back to the comment lines at the top of the script. When I replaced these with hashes the errors went away.

The next problem was Lachesis would abort with core dump with the following message: Lachesis: ChromLinkMatrix.cc:775: double ChromLinkMatrix::EnrichmentScore(const ContigOrdering&) const: Assertion `!isnan( null_score )' failed.

The cause seems to be division by zero. At the top of the loop total_len_seq is set to 0 https://github.com/shendurelab/LACHESIS/blob/master/ChromLinkMatrix.cc/#L742 But what is happening, I think, is sometimes there will be a loop break on every iteration, and the loop will finish before anything has been added to total_len_seq: https://github.com/shendurelab/LACHESIS/blob/master/ChromLinkMatrix.cc/#L758 Hence the division by zero leading to the failed assertion. The code handles one type of exception already: cases where there's only one contig, in which case the function returns 0. Therefore I'm returning zero when total_len_seq is zero (assuming this only happens due to skipping). I'm not sure if this is the best way to do things, but it allows the program to run to completion. Other people who are more familiar with the larger role of the function will have to tell me if that's going to break things downstream, but my philosophy is the program should handle all types of data, or give an informative message when it can't.