nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

tRNAScan issue: main cannot open TPCsignal consensus file #304

Closed ignadb closed 4 years ago

ignadb commented 4 years ago

Hi Jon,

Thanks for your continuous support of Funannotate! It is very useful!

I ran funannotate 1.5.2 with this command:

/home/chatchai/software/funanno2018/funannotate151/bin/funannotate-predict.py -i /home/chatchai/Desktop/F5-1/F5-1_scaffolds.minimap2.minlen500.sorted.masked.fasta -o /home/chatchai/Desktop/F5-1/funannotate152.F5-1.new.3 --species Hymenoscyphus koreanus --isolate F52847-1 --transcript_evidence MFB1_PFB1_concatenated.fasta --cpu 4

and it went well until tRNAScan, where the following message arose for all scaffolds.

tRNAscan1.4: main cannot open TPCsignal consensus file tRNAscan could not complete successfully for scaffold_1. Possible memory allocation problem or missing file. (Exit code=256).

Do you have an idea how to fix this? And I assume that this problem doesn't interfere with the gene models that are predicted and filtered by EVM before?

Thanks very much and have a great day!

Best regards, Chatchai

nextgenusfs commented 4 years ago

I think there is now trnascan2 version out. Try to run the cmd in the logfile manually and might give more informative error. But doesn’t seem like a funannotate issue as it was previously working correct?

ignadb commented 4 years ago

I am upgrading to 2.0 and running it now. Will get back to you when I know more. However, I feel like the prediction and filtering were done before tRNAScan was called, so the predicted genes should be fine. I attached funannotate trace in case you want to confirm my thought.

[12:45 PM]: OS: linux2, 32 cores, ~ 132 GB RAM. Python: 2.7.12 [12:45 PM]: Running funannotate v1.5.2-30c1166 [12:45 PM]: Augustus training set for hymenoscyphus_koreanus_f52847-1 already exists. To re-train provide unique --augustus_species argument [12:45 PM]: AUGUSTUS (3.2.3) detected, version seems to be compatible with BRAKER and BUSCO [12:46 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [12:46 PM]: Genome loaded: 1,792 scaffolds; 57,382,120 bp; 19.68% repeats masked [12:46 PM]: Aligning transcript evidence to genome with minimap2 [12:46 PM]: Found 25,601 alignments, wrote GFF3 and Augustus hints to file [12:46 PM]: Mapping proteins to genome using Diamond blastx/Exonerate [12:46 PM]: Using 544,324 proteins as queries [12:46 PM]: Running Diamond pre-filter search [12:54 PM]: Found 505,527 preliminary alignments [03:07 PM]: Exonerate finished: found 1,382 alignments [03:07 PM]: Running GeneMark-ES on assembly [03:47 PM]: Converting GeneMark GTF file to GFF3 [03:48 PM]: Found 14,502 gene models [03:48 PM]: Running Augustus gene prediction [04:05 PM]: Found 11,724 gene models [04:05 PM]: Pulling out high quality Augustus predictions [04:05 PM]: Found 1,644 high quality predictions from Augustus (>90% exon evidence) [04:05 PM]: Summary of gene models passed to EVM (weights):

Augustus models (1): 10,080 Genemark models (1): 14,502 HiQ models (2): 1,644 Pasa models (1): 0 Total models: 26,226

[04:05 PM]: Setting up EVM partitions [04:05 PM]: Generating EVM command list [04:05 PM]: Running EVM commands with 3 CPUs [04:32 PM]: Combining partitioned EVM outputs [04:32 PM]: Converting EVM output to GFF3 [04:34 PM]: Collecting all EVM results [04:34 PM]: 14,240 total gene models from EVM [04:34 PM]: Generating protein fasta files from 14,240 EVM models [04:35 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [04:35 PM]: Found 328 gene models to remove: 3 too short; 0 span gaps; 421 transposable elements [04:35 PM]: 13,912 gene models remaining [04:35 PM]: Predicting tRNAs [04:37 PM]: Found 128 tRNA gene models [04:37 PM]: 128 tRNAscan models are valid (non-overlapping) [04:37 PM]: Generating GenBank tbl annotation file [04:37 PM]: Converting to final Genbank format [04:39 PM]: Collecting final annotation files for 14,040 total gene models [04:39 PM]: Funannotate predict is finished, output files are in the /home/chatchai/Desktop/F5-1/funannotate152.F5-1.new.3/predict_results folder

nextgenusfs commented 4 years ago

Yes tRNA prediction is not part of the EVM proteins coding methods so it has no effect on the rest of the gene models.