nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

Ploidy question #900

Open Niohuruzh opened 1 year ago

Niohuruzh commented 1 year ago

Hi, Thanks for your excellent software to help me to study fungi well. Here is a question about the option -ploidy. Under what condition will I use the 2 or 1? For example, I have a heterozygous fungus and the assembly size is two-fold than haploidy under the same genus. So should I use 2 to predict the genome? If I use 1 to predict the genome, is there any problem with the predicted result?

Looking forward to your reply Best Wishes!

hyphaltip commented 1 year ago

yeah that would be better to specify --ploidy 2 if you have the duplication in the assembly -- the ploidy is used to interpret the BUSCO results and choose markers for training and how the exonerate analyses are run. It can work with the --ploidy 1 but may have some troubles.

Niohuruzh commented 1 year ago

Thanks. But how about a homozygous diploid? Should I still use --ploidy 2 when its assembly size is halploid?

nextgenusfs commented 1 year ago

So the --ploidy option doesn't do a whole in terms of gene predictions, however as @hyphaltip said it controls the number of hits for protein alignments but then also allows for duplicated BUSCO models to be used for training the ab initio trainers (default is to only use single copy hits as a training set, when ploidy is > 1 it then chooses the best gene prediction for each model regardless if it is duplicated).

Niohuruzh commented 1 year ago

Thanks so much.