Genomes storage ..............................: Initialized (storage hash: 98ba7297)
Num genomes in storage .......................: 2
Num genomes will be used .....................: 2
Pan database .................................: A new database, /lus/scratch/usr/jmeppley/opt/workflows/test/scratch/pang/Parvarchaea/Parvarchaea-PAN.db, has been created.
Exclude partial gene calls ...................: False
[23 Mar 17 20:34:23 Uniquing the output FASTA file] ... Traceback (most recent call last):
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/bin/anvi-pan-genome", line 4, in <module>
__import__('pkg_resources').run_script('anvio==2.1.0', 'anvi-pan-genome')
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1494, in run_script
exec(code, namespace, namespace)
File "/lus/scratch/usr/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/anvio-2.1.0-py2.7-linux-x86_64.egg-info/scripts/anvi-pan-genome", line 99, in <module>
pan.process()
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/anvio/panops.py", line 1075, in process
unique_proteins_FASTA_path, unique_proteins_names_dict = self.genomes_storage.gen_combined_protein_sequences_FASTA(combined_proteins_FASTA_path, exclude_partial_gene_calls=self.exclude_partial_gene_calls)
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/anvio/auxiliarydataops.py", line 357, in gen_combined_protein_sequences_FASTA
unique_proteins_FASTA_path, unique_proteins_names_file_path, unique_proteins_names_dict = utils.unique_FASTA_file(output_file_path, store_frequencies_in_deflines=False)
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/anvio/utils.py", line 950, in unique_FASTA_file
input_fasta = u.SequenceSource(input_file_path, unique=True)
File "/home/jmeppley/opt/workflows/test/conda/envs/anvi2/lib/python2.7/site-packages/anvio/fastalib.py", line 101, in __init__
raise FastaLibError, "File '%s' does not seem to be a FASTA file." % self.fasta_file_path
anvio.fastalib.FastaLibError: Fasta Lib Error: File '/lus/scratch/usr/jmeppley/opt/workflows/test/scratch/pang/Parvarchaea/combined-proteins.fa' does not seem to be a FASTA file.
However, this only happens on our cray system using a Lustre file share. Our vanilla centos boxes work OK, even over NFS. I've diagnosed the problem. In the gen_combined_protein_sequences_FASTA function in auxiliarydataops.py, the output file is never closed before calling unique_FASTA_file.
I haven't yet figured out if this is still a problem with later versions.
Runing the pangenome command:
Gives me this error:
However, this only happens on our cray system using a Lustre file share. Our vanilla centos boxes work OK, even over NFS. I've diagnosed the problem. In the
gen_combined_protein_sequences_FASTA
function inauxiliarydataops.py
, the output file is never closed before calling unique_FASTA_file.I haven't yet figured out if this is still a problem with later versions.