nf-core / genomeqc

Compare the quality of multiple genomes, along with their annotations.
https://nf-co.re/genomeqc
MIT License
3 stars 8 forks source link

BUSCO fails #37

Open FernandoDuarteF opened 1 month ago

FernandoDuarteF commented 1 month ago

When running genomeqc on assests/samplesheet.csv (bees and wasps) BUSCO fails on the SEPP step with this warning message:

Placements failed. Try to rerun increasing the memory or select a lineage manually.

even though 72GB of RAM are being used.

SEPP is being used to automatically infer the lineage database. When the lineage database is explcitly set, BUSCO runs successfully.

FernandoDuarteF commented 1 month ago

To further check what's going on see branch subworkflows-agat-longest-isoform

FernandoDuarteF commented 1 month ago

Seems to be related to agat_sp_keep_longest_isoform.pl filtering. Moving back to excon's GFFREAD solves the issue.

FernandoDuarteF commented 1 week ago

Using samplesheet.csv to run BUSCO requires 27 GB at most. So it's not a memory problem.

FernandoDuarteF commented 1 week ago

I tried with only half of the protein sequences of Vespa velutina extracted with GFFREAD and BUSCO hangs on the SEPP step. So it's not related to the file size.