nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Can the RNA fasta files from NCBI's Refseq be provided to the "funannotate predict --transcript_evidence" parameter? #979

Open maruiqi0710 opened 7 months ago

maruiqi0710 commented 7 months ago

I am annotating the fungal genome. Can the RNA fasta files from the same family in NCBI's Refseq be combined (redundant data) and then provided to the "funannotate predict --transcript_evidence" parameter? I think the data in Refseq may be more recent than the EST data in JGI Mycocosm, and the data in refseq are reliable.

nextgenusfs commented 7 months ago

Do you mean gene model transcripts from Refseq? If so, wouldn't consider those "evidence" they are derived from predictions. There is nothing stopping you from using non-curated gene models as transcript evidence, this idea though has the potential to guide EVM to choose similar gene models as predicted previously, sort of a circular training loop. Ideally you want experimentally derived data to inform EVM on how to construct the consensus gene models.

RefSeq annotations are not individually curated, they are in fact identical to the Genbank record that was deposited by a researcher. RefSeq's functionality is that it contains one reference genome/annotation for a "species".

If you mean that there is RNA-seq data available, you could potentially use that to train funannotate.