nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
314 stars 83 forks source link

funannotate compare fails with >60 genomes #341

Closed reslp closed 3 years ago

reslp commented 4 years ago

Hi,

I try to run funannotate on a large number of fungal genomes (total > 70). Annotation of the individual genomes is very standardized and the same of all genomes. When I try to run funannotate compare on them, the script stops at summarizing CAZymes without showing any error. I have used compare with different combinations of the included genomes before that successfully and it seems that it runs fine when I reduce the included number of genomes to 60. I tried this with v1.5.2 and v1.6.0 and the behavior is the same.

Do you maybe have an idea what might be the problem?

many thanks,

Philipp

This is the only output I see after the genome files have been read by funannotate compare:

[01:04 PM]: Summarizing secondary metabolism gene clusters [01:04 PM]: Summarizing PFAM domain results [01:04 PM]: Summarizing InterProScan results [01:05 PM]: Loading InterPro descriptions [01:05 PM]: Summarizing MEROPS protease results [01:05 PM]: found 32/120 MEROPS familes with stdev >= 1.000000 [01:05 PM]: Summarizing CAZyme results [01:05 PM]: found 138/330 CAZy familes with stdev >= 1.000000

nextgenusfs commented 4 years ago

I've never tried to run it with more than ~ 12 genomes. Not sure why it would be stuck there specifically -- what output is generated up until this point?

reslp commented 4 years ago

Hi Jon,

thank you for your help. This is a listing of the funannotate_compare directory after the run stopped:

$ ls -lah
total 19M
drwxrwxr-x 13 reslp reslp 4.0K Oct 22 09:25 .
drwxrwxr-x 86 reslp reslp    4.0K Oct 22 09:06 ..
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 cazy
drwxr-xr-x  2 reslp reslp 4.0K Oct 17 12:42 css
-rw-rw-r--  1 reslp reslp 8.3K Oct 22 09:25 funannotate-compare.log
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:24 go_terms
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 interpro
-rw-rw-r--  1 reslp reslp  12M Oct 22 09:25 interpro.html
drwxr-xr-x  2 reslp reslp 4.0K Oct 17 12:42 js
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 merops
-rw-rw-r--  1 reslp reslp 172K Oct 22 09:25 merops.html
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:06 orthology
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 pfam
-rw-rw-r--  1 reslp reslp 7.0M Oct 22 09:25 pfam.html
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:06 phylogeny
drwxrwxr-x  2 reslp reslp  16K Oct 22 09:25 protortho
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 secmet
-rw-rw-r--  1 reslp reslp 3.0K Oct 22 09:25 secmet.html

The logfile does not show anything special either, just the same output as seen on screen while running compare. Since the last output before returning to the command prompt has to do with CAZymes I expected that the problem would be somewhere there. However it seems that the files are fine. Here is a listing from the funannotate_compare/cazy directory:

$ ls -lah
total 144K
drwxrwxr-x  2 reslp reslp 4.0K Oct 22 09:25 .
drwxrwxr-x 13 reslp reslp 4.0K Oct 22 09:25 ..
-rw-rw-r--  1 reslp reslp  31K Oct 22 09:25 CAZy.graph.pdf
-rw-rw-r--  1 reslp reslp 100K Oct 22 09:25 CAZyme.all.results.csv
-rw-rw-r--  1 reslp reslp 3.2K Oct 22 09:25 CAZyme.summary.results.csv

The only thing which is missing is the cazy.html file. Maybe this is where the problem is? The folders phylogeny and orthology are also still empty, which is to be expected I at this point I guess.

best, Philipp

nextgenusfs commented 4 years ago

You could try to comment out that section of the script and see if it then completes the rest. My thought would be that perhaps the plot generation is what is causing the problem, it’s trying to draw a heat map of all the results and maybe that is resulting in the error?

reslp commented 4 years ago

It looks like this did the trick. Thank you. The script is running now (already past CAZymes) I will report back once it is finished.

nextgenusfs commented 4 years ago

Okay good. So I guess we need to stress test this part of the code. I’m trying to repackage script for conda recipe and then we need to go down the py3 conversion path. Likely the compare script will get a major rework, so any other issues you run into will be helpful.

reslp commented 4 years ago

So, the rest of the script ran just fine as far as I saw. I will run it several more times with different datasets. I will let you know if I run into any problem. I would also like to thank your for your amazing work on this, funannotate is really nice!

nextgenusfs commented 4 years ago

Thanks for the feedback and continue to let us know if you learn more about any other issues.