nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 84 forks source link

PASA only producing single exon genes and AUGUSTUS showing 0% Evaluation of Gene prediction #244

Closed PlantDr430 closed 5 years ago

PlantDr430 commented 5 years ago

Using the latest version 1.5.1

I ran Funannotate Train with RNAseq data and the resulting PASA 'bestmodel' output after Kallisto (attached) only contained genes with single exons for the fungal species Claviceps purpurea.

funannotate-train.log.gz

Those models were then passed to AUGUSTUS for training and the initial results from the first AUGUSTUS run (as I used the optimized method) were. I have also attached the augustus training log and the initial augustus training text.

[02:37 PM]: Training Augustus using PASA data. [02:37 PM]: 2,041 of 2,160 models pass training parameters [02:38 PM]: Augustus initial training results (specificity/sensitivity): nucleotides (0.0%/nan%); exons (0.0%/nan%); genes (0.0%/nan%).

Is this common for AUGUSTUS with PASA models as training? Also, is it common for best PASA models to be only single exons? I ran PASA separately and did not do the filtering with Kallisto, but PASA did produce multiple exon models.

augustus.initial.training.txt augustus_training.log funannotate_train.pasa.gff3.txt augustus.final.training.txt

At the end, AUGUSTUS tried to predict genes and found 0 gene models.

[02:38 PM]: Augustus initial training results (specificity/sensitivity): nucleotides (0.0%/nan%); exons (0.0%/nan%); genes (0.0%/nan%). [02:51 PM]: Augustus initial training results (specificity/sensitivity): nucleotides (0.0%/nan%); exons (0.0%/nan%); genes (0.0%/nan%). [02:51 PM]: Running Augustus gene prediction [02:58 PM]: Found 0 gene models

augustus-parallel.log

nextgenusfs commented 5 years ago

That is not normal behavior. Do you have the funannotate train logfile?

PlantDr430 commented 5 years ago

Yes, I just added it actually

nextgenusfs commented 5 years ago

There are a lot of PASA/transdecoder errors (many of which I've never seen before) -- what versions of each do you have installed? Several of the errors look like coordinate mismatches - which I don't understand unless the BAM files were from a previous run with a different assembly?

PlantDr430 commented 5 years ago

I just recently downloaded the latest versions of PASA (v2.3.3) and TransDecoder(v5.5.0).

The bam files were created in the Funannotate train as I used this command to run it.

[12/13/18 16:05:03]: /data/wyka/funannotate-master/bin/funannotate-train.py -i Cpur20_1_masked.fasta -o Cpur20_train -l Cpur20_w_filtered_1P.fastq.gz -r Cpur20_w_filtered_2P.fastq.gz --jaccard_clip --pasa_alignment_overlap 30.0 --no_trimmomatic --no_normalize_reads --species Claviceps purpurea --isolate Cpur20.1 --cpus 20

And I see inside the training folder that there is a file named "hisat2.coordSorted.bam" so I take that as Funannotate created the .bam file.

I currently have Hisat2 v 2.1.0 installed

PlantDr430 commented 5 years ago

Here is the Trinity-GG log file as well.

Trinity-gg.log.gz