nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

--busco_db and --busco_seed_species with RNA-seq #1012

Open enriquepola1996 opened 3 months ago

enriquepola1996 commented 3 months ago

Hello,

I am using funannotate for the first time with a non-model plant with a genome of approximately 3Gbp. I have an assembled transcriptome and RNA-seq data, at this moment I am doing the training with the transcriptome and the libraries, but I have doubts about the prediction part. Is it necessary to use --busco_db and --busco_seed_species when I have RNA-seq?

The training script I am using is the following:

`#!/bin/bash

PBS -N rnaseq_train

PBS -l nodes=1:ppn=20,vmem=150gb,walltime=700:00:00

PBS -o output.log

PBS -e error.log

PBS -q ensam

PBS -V

Carga el módulo

module load funannotate/2023.1

Cambia al directorio de trabajo

cd $PBS_O_WORKDIR

Ejecutar Funannotate con RNA-seq

funannotate train -i alt.fasta -o alt_fun \ --left CP148H_R1_001_P.fastq.gz CP17D_R1_001_P.fastq.gz \ --right CP148H_R2_001_P.fastq.gz CP17D_R2_001_P.fastq.gz \ --trinity ../Transcriptomes/Transcriptome.fasta \ --stranded no --cpus 20 --memory 150G --no_trimmomatic --max_intronlen 100000`

The script I want to use for the prediction is the following:

`#!/bin/bash

PBS -N predict_fun

PBS -l nodes=1:ppn=20,vmem=150gb,walltime=700:00:00

PBS -o predict_output.log

PBS -e predict_error.log

PBS -q ensam

PBS -V

Carga el módulo

module load funannotate/2023.1

Cambia al directorio de trabajo

cd $PBS_O_WORKDIR

Fun predict

funannotate predict \ -i alt.fasta \ -o altfun \ --optimize_augustus \ --repeats2evm \ --busco_db embryophyta \ --cpus 20 \ --max_intronlen 100000 \ --organism other \ --busco_seed_species rice`

Add rice because it is the most similar among all plant species. I would greatly appreciate comments and suggestions.