nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

No mask option in v1.3.4 #192

Closed EricFournier3 closed 6 years ago

EricFournier3 commented 6 years ago

funannotate mask -i Myc_1038_2000pb_and_5x_2.fasta --cpus 12 -o Myc_1038_Mask.fasta

mask option not recognized Usage: funannotate version: 1.3.4

Does the mask option is integrated in the predict option ?

The tutorial tells that we need to run mask first on Genome assembly only.

Thanks Eric

nextgenusfs commented 6 years ago

Yes sorry was updating docs for 1.4.0 release where mask is separate from predict

EricFournier3 commented 6 years ago

so for v1.3.4, do i need to run predict first for Genome Assembly only. I didn't find the doc for Genome Assembly Only in funannotate-1.3.4/docs

nextgenusfs commented 6 years ago

Can run predict which will run masking as the first step. I decoupled the masking because repeatmasker essentially requires RepBase database which is not publicly available to everyone. So wanted to make the masking step more flexible.

nextgenusfs commented 6 years ago

http://funannotate.readthedocs.io/en/latest/tutorials.html#genome-assembly-only

Can follow this just ignore the mask step.

EricFournier3 commented 6 years ago

thank you

EricFournier3 commented 6 years ago

my assembly has 31Mbp How long the Soft-masking: generating repeat library is suppose to takes in average. It is still running after 1 hour with 12 cpus. Just want to be sure that it's not stuck in rmblastn

[foueri01@inspq.qc.ca@slbio00d Myco_20180701]$ funannotate predict -i Myc_1038_2000pb_and_5x_2.fasta -o OutRes --species aspergillus_fumigatus --busco_seed_species aspergillus_fumigatus --cpus 12

[11:21 AM]: OS: linux2, 40 cores, ~ 528 GB RAM. Python: 2.7.15 [11:21 AM]: Running funannotate v1.3.4 [11:21 AM]: Augustus training set for aspergillus_fumigatus already exists. To re-train provide unique --augustus_species argument [11:21 AM]: AUGUSTUS (3.2.3) detected, version seems to be compatible with BRAKER and BUSCO [11:21 AM]: Loading sequences and soft-masking genome [11:21 AM]: Soft-masking: building RepeatModeler database [11:21 AM]: Soft-masking: generating repeat library using RepeatModeler

nextgenusfs commented 6 years ago

RepeatModeler is quite slow. You could use the ‘—repeatmasker_species fungi’ option to speed it up.

EricFournier3 commented 6 years ago

i executed funannotate iprscan locally with the following command; funannotate iprscan -i Res_1038 -m local --iprscan_path /home/foueri01@inspq.qc.ca/ProjetsNGS/Funannotate/InterProScan/interproscan-5.30-69.0/interproscan.sh

Running InterProScan5 on 10016 proteins Imporant: you need to manually configure your interproscan.properties file for embedded workers. Will try to launch 12 interproscan processes, adjust -c,--cpus for your system InterProScan5 search has completed successfully! Results are here: Res_1038/annotate_misc/iprscan.xml

but the iprscan.xml has only this line ?xml version="1.0" encoding="UTF-8" standalone="yes"? protein-matches xmlns="http://www.ebi.ac.uk/interpro/resources/schemas/interproscan5" /protein-matches>. Is this the normal output ? For the annotate step, do i need to provide this xml file with the option --iprscan ? I am a little confused

and it produced a directory (iprscan_21136) with fasta files.

Thanks

nextgenusfs commented 6 years ago

Sounds like the run failed. Might be related to the python3 requirement for newest version of interproscan. https://github.com/nextgenusfs/funannotate/issues/188 @hyphaltip how did you configure locally to have funannotate run with python2 and launch interproscan with python3?

When the script executes correctly, it will save the output in the location where funannotate annotate will find it, so you don't need to provide it to that command (although you can if you run it elsewhere for example).

EricFournier3 commented 6 years ago

ok i will try with the docker instead. Do you think i will have the same issue with python3 ?

nextgenusfs commented 6 years ago

No the docker interproscan is an older version.

EricFournier3 commented 6 years ago

great ! I have tried to annotate without Interproscan. It works but there is many hypothetical protein. I hope the interproscan docker will help.

nextgenusfs commented 6 years ago

Interproscan will help add some functional annotation (including GO terms), but won't change any of the protein deflines -- many/most of the proteins will be hypothetical. The names/product deflines are derived from EggNog mapper and blast search of the UniProt/SwissProt. On a typical fungal genome I would only expect to get 1500 - 2500 names/product deflines from ~ 10,000 genes.

EricFournier3 commented 6 years ago

thanks for that clarification

hyphaltip commented 6 years ago

On our cluster we use the unix module system so I had to edit interproscan.sh to explicitly load python3 and associated paths.

The explicit path is also set in the properties file which also helps.

Jason Stajich, PhD jasonstajich.phd@gmail.com On Jul 6, 2018, 7:51 AM -0700, Eric Fournier notifications@github.com, wrote:

thanks for that clarification — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

EricFournier3 commented 6 years ago

with Interproscan Docker, everything is working correctly.