Facing problem in assigning gene name and protein name

nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline

http://funannotate.readthedocs.io

BSD 2-Clause "Simplified" License

301 stars 82 forks source link

Facing problem in assigning gene name and protein name #871

Closed DrNavi closed 1 year ago

DrNavi commented 1 year ago

I have used funannotate for annotation of my genome. my genome is magnaporth oryzea, its annotation are not available yet but very close specie magnaporth grisea is present in augustus speicie list. I was able to run it error free bur the problem is all genes predicted are coming with the name FUN_0001 etc and all proteins are hypothetical protein. Why is it so ?? my genome assembly stats are fine. Genome is around 43 MB and assembly comprises of 11 contigs.

hyphaltip commented 1 year ago

did you provide the --name option to predict or annotate step, that sets the LOCUS prefix - typically you register your project at NCBI / EMBL to get a BioProject and a locus prefix for your organism. This is what you provide here.

For the functional assignment did you run the annotate step - that assigned predicted function based on inferred homology.

DrNavi commented 1 year ago

I didn't provide the locus tag and I haven't register my project to NCBI yet.

I did the functional annotation and got some go terms which eventually I will be able to decode to gene .protein names. But still 99% of my proteins are still named hypothetical proteins. If I take the sequence and blast it to NCBI they give me a hit to magnaporth grisea with 99% similarity but why are those names not being assigned in at the annotation step?

nextgenusfs commented 1 year ago

This is covered many times in the issues. Generally this is the expected behavior due to funannotate being geared toward genome submission at NCBI, which in order to have a valid product defline (ie not "hypothetical protein") you need to have a valid common gene name. That is difficult to do and therefore the default is to be conservative. If you run EggNog Mapper you will get a few more descriptions. Here is one comment from one of these threads. https://github.com/nextgenusfs/funannotate/issues/445#issuecomment-652018889