yoshihikosuzuki / ant-asm-workflow

Genome assembly workflow with HiFi + Omni-C
1 stars 1 forks source link

BUSCO error #1

Open AlesBucek opened 2 years ago

AlesBucek commented 2 years ago

Dear Yoshi, I run into problem during contig quality assesment: BUSCO fails. Below see content of busco.log:

Copying Augustus config dir to /flash/BourguignonU/Bucek/Aleos_flash/KT1060-2_UL/10-contigs/hifiasm/01-busco/augustus_config
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
run_busco: error: ambiguous option: --augustus could match --augustus_parameters, --augustus_species

It seems as if the --augustus flag was not recognized which is surprising as it seems it should be an existing flag (at least in the latest version: https://busco.ezlab.org/busco_userguide.html).

(besides that your workflow is running smoothly for my data and it is incredibly useful - thanks!)

yoshihikosuzuki commented 2 years ago

Hi Ales,

Thank you for using our workflow and giving feedback!

Just as you said, this error is wired - for now I am not sure what the problem is, but can you check the following and let me know here?

(That is, I want to know which command/script you are running to invoke BUSCO. My expectation is /apps/unit/BioinfoUgrp/Other/BUSCO/5.1.3/busco, which actually triggers the execution of a singularity container. I think the error is somewhat related to singularity, because you said the wf is running without problems up to BUSCO and BUSCO is the first task that uses singularity in the wf.)

Best, Yoshi

AlesBucek commented 2 years ago

Hi Yoshi, thanks. Feel free to close or ignore the issue if it is more tricky than fixing a typo - I run now BUSCO outside of the ant-asm-workflow so this is not a critical problem for me.

ad1:

#!/bin/bash
#SBATCH -J busco
#SBATCH -o busco.log
#SBATCH -p compute
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=500G
#SBATCH -t 48:00:00
shopt -s expand_aliases && source ~/.bashrc && set -e || exit 1
source ../../../config.sh

ASM=contigs.fasta
N_THREADS=16

OUT_PREFIX=$(basename ${ASM} .gz)
OUT_DIR=${OUT_PREFIX%.*}.busco

ml ${_BUSCO}

SHARED_ARGS="-f --update-data -c ${N_THREADS} -m genome -l ${BUSCO_DB} -i ${ASM}"
## Case 1. Using Metaeuk for gene annotation
#busco ${SHARED_ARGS} -o ${OUT_DIR}
## Case 2. Using Augustus for gene annotation
busco ${SHARED_ARGS} -o ${OUT_DIR}_augustus --augustus

if [ "$AUTO_DEL" = "true" ]; then
    source ./remove_tmp_files.sh
fi

ad2: _BUSCO=Other/BUSCO/5.1.3

yoshihikosuzuki commented 2 years ago

Hi Ales,

Thank you for the information. Actually the BUSCO script you are using is an old version, but this should not be the cause of this issue (I confirmed that the script you pasted runs successfully without the error using the Other/BUSCO/5.1.3 module.)

One reason I could imagine is that you have installed BUSCO (of an old version that does not have the --augustus option) in your own Anaconda environment, and this old BUSCO command is prioritized over the one in the Other/BUSCO/5.1.3 module. I think you can check if this is true by doing e.g.

ml Other/BUSCO/5.1.3
which busco

and seeing if the answer is /apps/unit/BioinfoUgrp/Other/BUSCO/5.1.3/busco or not.

If this is not the cause, then I am sorry but I have no idea for now, and meanwhile yes I think it would be a good idea to just run BUSCO outside the workflow.

AlesBucek commented 2 years ago

Hi Yoshi, if I first load the "bioinfo-ugrp-modules" ml bioinfo-ugrp-modules and then ml Other/BUSCO/5.1.3 ; which busco my busco is indeed "/apps/unit/BioinfoUgrp/Other/BUSCO/5.1.3/busco". Let's forget about it for now. Thx!