Closed EricFournier3 closed 6 years ago
Yes sorry was updating docs for 1.4.0 release where mask is separate from predict
so for v1.3.4, do i need to run predict first for Genome Assembly only. I didn't find the doc for Genome Assembly Only in funannotate-1.3.4/docs
Can run predict which will run masking as the first step. I decoupled the masking because repeatmasker essentially requires RepBase database which is not publicly available to everyone. So wanted to make the masking step more flexible.
http://funannotate.readthedocs.io/en/latest/tutorials.html#genome-assembly-only
Can follow this just ignore the mask step.
thank you
my assembly has 31Mbp How long the Soft-masking: generating repeat library is suppose to takes in average. It is still running after 1 hour with 12 cpus. Just want to be sure that it's not stuck in rmblastn
[11:21 AM]: OS: linux2, 40 cores, ~ 528 GB RAM. Python: 2.7.15 [11:21 AM]: Running funannotate v1.3.4 [11:21 AM]: Augustus training set for aspergillus_fumigatus already exists. To re-train provide unique --augustus_species argument [11:21 AM]: AUGUSTUS (3.2.3) detected, version seems to be compatible with BRAKER and BUSCO [11:21 AM]: Loading sequences and soft-masking genome [11:21 AM]: Soft-masking: building RepeatModeler database [11:21 AM]: Soft-masking: generating repeat library using RepeatModeler
RepeatModeler is quite slow. You could use the ‘—repeatmasker_species fungi’ option to speed it up.
i executed funannotate iprscan locally with the following command; funannotate iprscan -i Res_1038 -m local --iprscan_path /home/foueri01@inspq.qc.ca/ProjetsNGS/Funannotate/InterProScan/interproscan-5.30-69.0/interproscan.sh
Running InterProScan5 on 10016 proteins Imporant: you need to manually configure your interproscan.properties file for embedded workers. Will try to launch 12 interproscan processes, adjust -c,--cpus for your system InterProScan5 search has completed successfully! Results are here: Res_1038/annotate_misc/iprscan.xml
but the iprscan.xml has only this line ?xml version="1.0" encoding="UTF-8" standalone="yes"? protein-matches xmlns="http://www.ebi.ac.uk/interpro/resources/schemas/interproscan5" /protein-matches>. Is this the normal output ? For the annotate step, do i need to provide this xml file with the option --iprscan ? I am a little confused
and it produced a directory (iprscan_21136) with fasta files.
Thanks
Sounds like the run failed. Might be related to the python3 requirement for newest version of interproscan. https://github.com/nextgenusfs/funannotate/issues/188 @hyphaltip how did you configure locally to have funannotate run with python2 and launch interproscan with python3?
When the script executes correctly, it will save the output in the location where funannotate annotate
will find it, so you don't need to provide it to that command (although you can if you run it elsewhere for example).
ok i will try with the docker instead. Do you think i will have the same issue with python3 ?
No the docker interproscan is an older version.
great ! I have tried to annotate without Interproscan. It works but there is many hypothetical protein. I hope the interproscan docker will help.
Interproscan will help add some functional annotation (including GO terms), but won't change any of the protein deflines -- many/most of the proteins will be hypothetical. The names/product deflines are derived from EggNog mapper and blast search of the UniProt/SwissProt. On a typical fungal genome I would only expect to get 1500 - 2500 names/product deflines from ~ 10,000 genes.
thanks for that clarification
On our cluster we use the unix module system so I had to edit interproscan.sh to explicitly load python3 and associated paths.
The explicit path is also set in the properties file which also helps.
Jason Stajich, PhD jasonstajich.phd@gmail.com On Jul 6, 2018, 7:51 AM -0700, Eric Fournier notifications@github.com, wrote:
thanks for that clarification — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
with Interproscan Docker, everything is working correctly.
funannotate mask -i Myc_1038_2000pb_and_5x_2.fasta --cpus 12 -o Myc_1038_Mask.fasta
mask option not recognized Usage: funannotate
version: 1.3.4
Does the mask option is integrated in the predict option ?
The tutorial tells that we need to run mask first on Genome assembly only.
Thanks Eric