Closed jolespin closed 4 years ago
Hmmm.... I got some progress (I think).
(funannotate_env) -bash-4.1$ conda deactivate
conda activate (base) -bash-4.1$ conda activate funannotate_env
----------------------------------
Activating Funannotate Environment
----------------------------------
GeneMark-ES_ET-4.46 license already exists: /home/jespinoz/.gm_key
Please delete the file and reactivate environment if you to overwrite the license file.
(funannotate_env) -bash-4.1$ ls -lh /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/bin/ | grep "emapper.py"
lrwxrwxrwx 1 jespinoz tigr 76 Aug 21 04:08 emapper.py -> /usr/local/devel/ANNOTATION/jespinoz/Packages/eggnog-mapper-1.0.3/emapper.py
(funannotate_env) -bash-4.1$ funannotate check –show-versions
-------------------------------------------------------
Checking dependencies for funannotate v1.6.0-df4262f
-------------------------------------------------------
You are running Python v 2.7.15. Now checking python packages...
All 11 python packages installed
You are running Perl v 5.026002. Now checking perl modules...
All 27 Perl modules installed
Checking external dependencies...
ERROR: emapper.py not installed
Checking Environmental Variables...
All 7 environmental variables are set
-------------------------------------------------------
Is there a particular place funannotate
is looking for emapper.py
?
It’s literally calling which emapper.py, well something similar to that. It just tries to run the command and get the version.
Interesting. I wonder if it would correctly call it in the actual pipeline since it's in the path via export PATH=${OPT}/pasa-2.3.3/bin:${PACKAGES}/RepeatModeler-1.0.11:${PACKAGES}/GeneMark-ES_ET-4.46/gm_et_linux_64/:${PACKAGES}/eggnog-mapper-1.0.3:$PATH
I tried running the test but got an error at one of the stages:
(funannotate_env) -bash-4.1$ funannotate test -t predict rna-seq annotate --cpus 4
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 4 --species Awesome testicus
#########################################################
-------------------------------------------------------
[07:09 PM]: OS: linux2, 4 cores, ~ 8 GB RAM. Python: 2.7.15
[07:09 PM]: Running funannotate v1.6.0-df4262f
[07:09 PM]: Augustus training set for saccharomyces already exists. To re-train provide unique --augustus_species argument
[07:09 PM]: AUGUSTUS (3.3) detected, version seems to be compatible with BRAKER and BUSCO
[07:09 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
Traceback (most recent call last):
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers
finalizer()
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/shutil.py", line 266, in rmtree
onerror(os.remove, fullname, sys.exc_info())
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/shutil.py", line 264, in rmtree
os.remove(fullname)
OSError: [Errno 16] Device or resource busy: '/usr/local/scratch/METAGENOMICS/jespinoz/TMPDIR/pymp-un30va/.nfs000000013ae1f2f9000026b8'
Traceback (most recent call last):
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers
finalizer()
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/shutil.py", line 266, in rmtree
onerror(os.remove, fullname, sys.exc_info())
File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/funannotate_env/lib/python2.7/shutil.py", line 264, in rmtree
os.remove(fullname)
OSError: [Errno 16] Device or resource busy: '/usr/local/scratch/METAGENOMICS/jespinoz/TMPDIR/pymp-bgi2E7/.nfs000000013ae1f2fb000026b9'
[07:09 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[07:09 PM]: Mapping proteins to genome using Diamond blastx/Exonerate
[07:09 PM]: Using 1,065 proteins as queries
[07:09 PM]: Running Diamond pre-filter search
[07:09 PM]: Found 1,774 preliminary alignments
[07:10 PM]: Exonerate finished: found 1,347 alignments
[07:10 PM]: Running GeneMark-ES on assembly
[07:13 PM]: Converting GeneMark GTF file to GFF3
[07:13 PM]: Found 1,540 gene models
[07:13 PM]: Running BUSCO to find conserved gene models for training Augustus
[07:13 PM]: Multi-threading in tblastn v2.6.0 is unstable, running in single threaded mode for BUSCO
[07:14 PM]: BUSCO training of Augusus failed, check busco logs, exiting
#########################################################
Traceback (most recent call last):
File "/usr/local/devel/ANNOTATION/jespinoz/Packages/Funannotate-1.6.0/bin/funannotate-test.py", line 334, in <module>
runPredictTest()
File "/usr/local/devel/ANNOTATION/jespinoz/Packages/Funannotate-1.6.0/bin/funannotate-test.py", line 168, in runPredictTest
assert 1500 <= countGFFgenes(os.path.join(tmpdir, 'annotate', 'predict_results', 'Awesome_testicus.gff3')) <= 1800
File "/usr/local/devel/ANNOTATION/jespinoz/Packages/Funannotate-1.6.0/bin/funannotate-test.py", line 58, in countGFFgenes
with open(input, 'rU') as f:
IOError: [Errno 2] No such file or directory: 'test-predict_224980/annotate/predict_results/Awesome_testicus.gff3'
Here's the busco.log
head:
Haha, I feel like I'm SO close to getting this to work smoothly for this diatom. I've learned A LOT about conda environments, perl libraries, and sourcing scripts during this process.
Might be the same write permissions error for Augustus.......?
Oh okay didn’t see the core dump. That looks like an Augustus compilation error? The proteinprofile mode is sensitive to compiler and it is inconsistent between versions. Although it seems to have passed the funannotate test. Is this a different version/binary of Augustus? Maybe copy the other version that you had working to this location?
Emapper is it required so if it isn’t in path when running annotate it will just skip that step.
So that Segmentation fault (core dumped)
from augustus
is caused from not being able to write files?
I checked which files were available the tmp
directory:
(funannotate_env) -bash-4.1$ ls test-predict_202108/annotate/predict_misc/busco/tmp/
CP022970.1saccharomyces_4117535203_.temp CP022972.1saccharomyces_4117535203_.temp CP022974.1saccharomyces_4117535203_.temp saccharomyces_4117535203.nhr saccharomyces_4117535203.nsq
CP022971.1saccharomyces_4117535203_.temp CP022973.1saccharomyces_4117535203_.temp CP022975.1saccharomyces_4117535203_.temp saccharomyces_4117535203.nin
I then tried calling augustus
externally:
augustus --stopCodonExcludedFromCDS=False --codingseq=1 --proteinprofile=/usr/local/scratch/METAGENOMICS/jespinoz/db/funannotate_db/dikarya/prfl/EOG092644X6.prfl --predictionStart=109668 --predictionEnd=120165 --species=anidulans test-predict_202108/annotate/predict_misc/busco/tmp/CP022970.1saccharomyces_4117535203_.temp
Segmentation fault (core dumped)
Could it be a faulty installation of augustus
?
Looks like compilation error, so yes probably install error.
Conda package now available -- should fix all these install errors (I hope).
This is awesome. Thank you so much for doing this! I literally just got my complete run finished the other day for my diatom.
I'm trying to do following to assemble a diatom:
(1) Kneaddata/Trimmomatic for my transcript reads; (2) Mapping the reads to my unmasked genomes; (3) Using RNA-Spades for the transcript assembly using the reads from [2]; (4)
funannotate train
for each of my libraries ( I have a lot); (5) Use the following to predict genes usingfunannotate predict
: (5a) All of my training files from [4] ( I believe these will be gff3 files fromfunannotate train
)--pasa_gff
; (5b) trusted proteins from the nearest organism--protein_evidence
; (5c) A merged map file for all of my transcriptomesMy questions:
(1) Should the above protocol work? (2) Can I use multuple --rna_bam? (3) Does my --rna_bam need to be aligned to the repeat masked assembly? (4) Does my
funannotate train
step need to be on the repeat masked assembly?