Closed isgilman closed 3 years ago
Looks like maybe tbl2asn didn’t finish or crashed? There should be more info in the logfile I think? Is this the terminal stdout or the logfile?
This was terminal stdout, and you're right that it was last running tbl2sn
. The final command in funannotate-annotate.log
was
tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/1/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/1
I've tried running this in as an interactive job in slurm (with srun
) and submitting as a batch file (with sbatch
). Running interactively funannotate annotate
never finishes, even after 12 hours. Running as a batch job it finished quickly but throws the IndexError
above, which prints to stderr.
I tried running the command again (but forgot to add my emapper
results and got this log:
[05/05/20 09:50:19]: /gpfs/ysm/project/edwards/isg4/conda_envs/super_funannotate/funannotate/bin/funannotate-functional.py -i Portulaca-amilis.v0-FA1.6.0/ --iprscan /gpfs/ysm/scratch60/edwards/isg4/Pamilis_funannotate/ANNOTATE/Portulaca-amilis.v0-FA1.6.0/InterProScan/5.42-78.0/Portulaca_amilis.proteins.fa.xml --busco_db embryophyta --cpus 20 --species Portulaca amilis --sbt Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv
[05/05/20 09:50:19]: OS: linux2, 20 cores, ~ 131 GB RAM. Python: 2.7.15
[05/05/20 09:50:19]: Running funannotate v1.6.0-dfd805f
[05/05/20 09:50:19]: Output directory Portulaca-amilis.v0-FA1.6.0 already exists, will use any existing data. If this is not what you want, exit, and provide a unique name for output folder
[05/05/20 09:50:19]: Parsing input files
[05/05/20 09:50:19]: Existing tbl found: Portulaca-amilis.v0-FA1.6.0/update_results/Portulaca_amilis.tbl
[05/05/20 09:51:20]: Adding Functional Annotation to Portulaca amilis, NCBI accession: None
[05/05/20 09:51:20]: Annotation consists of: 53,007 gene models
[05/05/20 09:51:20]: 58,571 protein records loaded
[05/05/20 09:51:21]: Existing Pfam-A results found: Portulaca-amilis.v0-FA1.6.0/annotate_misc/annotations.pfam.txt
[05/05/20 09:51:21]: 49,237 annotations added
[05/05/20 09:51:21]: Running Diamond blastp search of UniProt DB version 2020_01
[05/05/20 09:51:33]: 7,288 valid gene/product annotations from 10,848 total
[05/05/20 09:51:34]: Running Eggnog-mapper
[05/05/20 09:51:34]: emapper.py -m diamond -i /gpfs/ysm/scratch60/edwards/isg4/Pamilis_funannotate/TRAIN/Portulaca-amilis.v0-FA1.6.0/annotate_misc/genome.proteins.fasta -o eggnog --cpu 20
[05/05/20 09:51:34]: Annotation database data/eggnog.db not present. Use download_eggnog_database.py to fetch it
[05/05/20 09:51:34]: No Eggnog-mapper results found.
[05/05/20 09:51:34]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.55
[05/05/20 09:51:35]: 7,288 gene name and product description annotations added
[05/05/20 09:51:35]: Existing MEROPS results found: Portulaca-amilis.v0-FA1.6.0/annotate_misc/annotations.merops.txt
[05/05/20 09:51:35]: 1,266 annotations added
[05/05/20 09:51:35]: Existing CAZYme results found: Portulaca-amilis.v0-FA1.6.0/annotate_misc/annotations.dbCAN.txt
[05/05/20 09:51:35]: 2,073 annotations added
[05/05/20 09:51:35]: Existing BUSCO2 results found: Portulaca-amilis.v0-FA1.6.0/annotate_misc/annotations.busco.txt
[05/05/20 09:51:35]: 1,893 annotations added
[05/05/20 09:51:35]: Skipping phobius predictions, try funannotate remote -m phobius
[05/05/20 09:51:35]: Skipping secretome: neither SignalP nor Phobius searches were run
[05/05/20 09:51:35]: 0 secretome and 0 transmembane annotations added
[05/05/20 09:51:36]: Parsing InterProScan5 XML file
[05/05/20 09:51:36]: /gpfs/ysm/project/isg4/conda_envs/super_funannotate/bin/python /gpfs/ysm/project/edwards/isg4/conda_envs/super_funannotate/funannotate/util/iprscan2annotations.py Portulaca-amilis.v0-FA1.6.0/annotate_misc/iprscan.xml Portulaca-amilis.v0-FA1.6.0/annotate_misc/annotations.iprscan.txt
[05/05/20 09:52:10]: Found 0 duplicated annotations, adding 228,183 valid annotations
[05/05/20 09:52:11]: Parsing tbl file: /gpfs/ysm/scratch60/edwards/isg4/Pamilis_funannotate/TRAIN/Portulaca-amilis.v0-FA1.6.0/annotate_misc/genome.tbl
[05/05/20 09:52:13]: Converting to final Genbank format, good luck!
[05/05/20 09:52:22]: tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/3/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/3
[05/05/20 09:52:22]: tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/5/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/5
[05/05/20 09:52:22]: tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/2/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/2
[05/05/20 09:52:22]: tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/4/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/4
[05/05/20 09:52:22]: tbl2asn -y "Annotated using funannotate v1.6.0-dfd805f" -N 1 -t Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -M n -j "[organism=Portulaca amilis]" -V b -c fx -T -a r10u -l paired-ends -Z Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/1/discrepency.report.txt -p Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/1
Things to try: upgrade tbl2asn and see if that fixes behavior. Because you have a large genome it is trying to split the input and run in parallel. I think this code was updated in most recent version so it’s possible updating funannotate could fix it. Alternatively you can try to run the tbl2asn command manually to generate the genbank output and subsequent submission files. Typically if you can run interactively it can be easier to spot errors that the program is outputting. I seem to recall there being a warning after tbl2asn is over 1 year old to update it, it could be that causing it to die silently I suppose.
Thanks again for the help, I've made a little progress on this issue but haven't gotten annotate
to complete. At your advice I checked tbl2asn
, and it was giving me the "over 1 year old" error. I'd already made an installation of the latest version of funannotate (1.7.4), but got the same issue because conda's version of tbl2asn
is 25.7, but the most recent version from NCBI is 25.8.
I copied the new version into conda_envs/funannotate/bin/
, which works, but annotate
still failed:
[05/11/20 11:42:48]: ERROR: GBK file conversion failed, tbl2asn parallel script has died
So I followed up on running the tbl2asn
command independently and this resulted in a problem with tbl2asn_parallel.py
:
(/gpfs/ysm/project/edwards/isg4/conda_envs/funannotate) [isg4@c14n02 TRAIN]$ /gpfs/ysm/project/edwards/isg4/conda_envs/funannotate/bin/python /gpfs/ysm/project/edwards/isg4/conda_envs/funannotate/lib/python2.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/genome.tbl -f Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn/genome.fsa -o Portulaca-amilis.v0-FA1.6.0/annotate_misc/tbl2asn --sbt Portulaca-amilis.v0-FA1.6.0/update_results/MIGS.eu.5.0.tsv -d discrepency.report.txt -s Portulaca_amilis -t -l paired-ends -v 1 -c 20
usage: tbl2asn_parallel.py [-h] -i INPUT -f FASTA -s SPECIES -o OUT --sbt SBT
[--isolate ISOLATE] [--strain STRAIN] [-c CPUS] -d
DISCREP [-t TBL2ASN] [-v VERSION]
tbl2asn_parallel.py: error: argument -t/--tbl2asn: expected one argument
I thought this might have to do with the way annotate.py
is passing the subprocess to tbl2asn_parallel.py
, particularly the -t -l paired-ends
. It looked like -t
was receiving no argument and that an unknown flag (-l
) was receiving paired-ends
. Since I'm not modifying the behavior of tbl2asn
and tbl2asn_parallel.py
creates a command by default with -l paired-ends
, I edited cmd
(lines 1078-1083 in annotate.py
) to remove the respective arguments from
cmd = [sys.executable, os.path.join(parentdir, 'aux_scripts', 'tbl2asn_parallel.py'),
'-i', TBLOUT, '-f', os.path.join(outputdir,
'annotate_misc', 'tbl2asn', 'genome.fsa'),
'-o', os.path.join(outputdir, 'annotate_misc',
'tbl2asn'), '--sbt', SBT, '-d', discrep,
'-s', organism, '-t', args.tbl2asn, '-v', str(annot_version), '-c', str(args.cpus)]
to
cmd = [sys.executable, os.path.join(parentdir, 'aux_scripts', 'tbl2asn_parallel.py'),
'-i', TBLOUT, '-f', os.path.join(outputdir,
'annotate_misc', 'tbl2asn', 'genome.fsa'),
'-o', os.path.join(outputdir, 'annotate_misc',
'tbl2asn'), '--sbt', SBT, '-d', discrep,
'-s', organism, '-v', str(annot_version), '-c', str(args.cpus)]
This ended up working, however it now looks like the results from tbl2asn
are not being combined correctly. The files errorsummary.val
, genome.gbf
, genome.val
, and discrepancy.report.txt
are all empty.
[isg4@c14n02 tbl2asn]$ ls -lha
drwxr-xr-x 2 isg4 edwards 4.0K May 11 12:06 1
drwxr-xr-x 2 isg4 edwards 4.0K May 11 12:06 2
drwxr-xr-x 2 isg4 edwards 4.0K May 11 12:06 3
drwxr-xr-x 2 isg4 edwards 4.0K May 11 12:06 4
drwxr-xr-x 2 isg4 edwards 4.0K May 11 12:06 5
-rw-r--r-- 1 isg4 edwards 0 May 11 12:06 errorsummary.val
-rw-r--r-- 1 isg4 edwards 7.2M May 11 12:06 genome1.tbl
-rw-r--r-- 1 isg4 edwards 7.3M May 11 12:06 genome2.tbl
-rw-r--r-- 1 isg4 edwards 7.1M May 11 12:06 genome3.tbl
-rw-r--r-- 1 isg4 edwards 8.0M May 11 12:06 genome4.tbl
-rw-r--r-- 1 isg4 edwards 3.5M May 11 12:06 genome5.tbl
-rw-r--r-- 1 isg4 edwards 391M May 11 12:06 genome.fsa
-rw-r--r-- 1 isg4 edwards 0 May 11 12:06 genome.gbf
-rw-r--r-- 1 isg4 edwards 33M May 11 12:06 genome.tbl
-rw-r--r-- 1 isg4 edwards 0 May 11 12:06 genome.val
Any thoughts or suggestions would be appreciated, Ian
Are you using the latest release? I am still using
v1.6.0-dfd805f
, however it looks like the pieces of code responsible for the error have not changed significantly since (I checked using compare).Describe the bug
funannotate annotate
is hung atConverting to final Genbank format, good luck!
or fails withWhat command did you issue?
Logfiles
OS/Install Information
I know this version is out of date but we're so close to the final annotation! Thank you for all the help!
Ian