nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

funannotate update with mysql #998

Open Nitin123-4 opened 5 months ago

Nitin123-4 commented 5 months ago

Hi Team,

I did the required settings for funannotate to use mysql i.e. https://github.com/PASApipeline/PASApipeline/wiki/setting-up-pasa-mysql.

It is running fine with 12 cpus.

funannotate update -i Genome_funannotate_train/ --cpus 12 --pasa_db mysql

Any idea how long will it take to complete?

Previous annotation consists of: 28,846 protein coding gene models and 683 non-coding gene models.

Nitin123-4 commented 5 months ago

Hi Team, Any idea about this ?

nextgenusfs commented 5 months ago

I guess it depends on how many transcripts you had in the initial round. It should be faster than when it ran train. I'm assuming that you ran train with the same mysql setup. The update step runs PASA genome comparison, so it takes your gene models from predict and looks at the alignments in the PASA database -- it can predict some new genes and/or alter some coding sequences. It runs this step two times iteratively. If it is still running after 7 days then I'd think something is wrong. I would guess several hours would be more typical.

Nitin123-4 commented 5 months ago

Hi Thanks for your reply. I ran funannotate train with sqlite. I am running update with mysql.

nextgenusfs commented 5 months ago

That won't work.

Nitin123-4 commented 5 months ago

Oh Okay. So funannotate train should also be with mysql and update also then it should be faster?

nextgenusfs commented 5 months ago

Yeah it tries to reuse the database. You could just run a fresh update but need to pass all the reads, etc and direct to new output folder. Update is capable of running the data from scratch as well.