I am using funannotate for the first time with a non-model plant with a genome of approximately 3Gbp. I have an assembled transcriptome and RNA-seq data, at this moment I am doing the training with the transcriptome and the libraries, but I have doubts about the prediction part. Is it necessary to use --busco_db and --busco_seed_species when I have RNA-seq?
Hello,
I am using funannotate for the first time with a non-model plant with a genome of approximately 3Gbp. I have an assembled transcriptome and RNA-seq data, at this moment I am doing the training with the transcriptome and the libraries, but I have doubts about the prediction part. Is it necessary to use --busco_db and --busco_seed_species when I have RNA-seq?
The training script I am using is the following:
`#!/bin/bash
PBS -N rnaseq_train
PBS -l nodes=1:ppn=20,vmem=150gb,walltime=700:00:00
PBS -o output.log
PBS -e error.log
PBS -q ensam
PBS -V
Carga el módulo
module load funannotate/2023.1
Cambia al directorio de trabajo
cd $PBS_O_WORKDIR
Ejecutar Funannotate con RNA-seq
funannotate train -i alt.fasta -o alt_fun \ --left CP148H_R1_001_P.fastq.gz CP17D_R1_001_P.fastq.gz \ --right CP148H_R2_001_P.fastq.gz CP17D_R2_001_P.fastq.gz \ --trinity ../Transcriptomes/Transcriptome.fasta \ --stranded no --cpus 20 --memory 150G --no_trimmomatic --max_intronlen 100000`
The script I want to use for the prediction is the following:
`#!/bin/bash
PBS -N predict_fun
PBS -l nodes=1:ppn=20,vmem=150gb,walltime=700:00:00
PBS -o predict_output.log
PBS -e predict_error.log
PBS -q ensam
PBS -V
Carga el módulo
module load funannotate/2023.1
Cambia al directorio de trabajo
cd $PBS_O_WORKDIR
Fun predict
funannotate predict \ -i alt.fasta \ -o altfun \ --optimize_augustus \ --repeats2evm \ --busco_db embryophyta \ --cpus 20 \ --max_intronlen 100000 \ --organism other \ --busco_seed_species rice`
Add rice because it is the most similar among all plant species. I would greatly appreciate comments and suggestions.