Closed apetkau closed 7 years ago
Yes, the error is definitely occurring due to the same tmp
directory being created and used by each instance in that case. One instance completes before the other cleaning up the tmp
directory.
Would there be a scenario where files with the same base filename are run at the same time?
A potential workaround would be to distinguish different input files by providing a genome_name
along with the path to the input fasta using the -i
arg:
for i in {1..2}; do
sistr -f csv -o predictions_$i \
-i /path/to/AE014613.fasta <genome_name>_$i \
2> $i.err 1> $i.out & done
This should produce tmp
dirs:
/tmp/<timestamp>-SISTR-<genome_name>_1
/tmp/<timestamp>-SISTR-<genome_name>_2
Or you could specify different base tmp
directories to produce the output files in.
I could add a condition to the tmp
dir creation to check if the directory already exists, and if so, create a tmp
dir with a slightly different name (e.g. append _<number>
).
Hmmm... with the current setup I have the files are named the same as I do an assembly first, so the file becomes something like contigs.fasta
.
The scenario I'm thinking of is automatically running SISTR on upload of sequencing data from a sequencing run. However, in general, they probably won't all run at the same time, except for my small test data.
I do think it's something to fix up though, either through your suggesting, or by using one of the tempfile functions (which will assign just random names).
Okay, I'll work up a fix and a new release with the check on tmp dir creation.
In the scenario you describe, would you be able to provide a genome name (or some kind of unique and useful identifier) to your input fasta? You could keep it as /path/to/contigs.fasta
but also supply a genome_name
, e.g.
sistr -o output -i /path/to/contigs.fasta genome_1337
So the SISTR output would show the name as genome_1337
which might be useful in the other output files like the cgMLST profile output or the detailed cgMLST allele search results.
Awesome, thanks :)
Yes, I'll also look at giving the genomes passed to SISTR a better name.
I've found that if you run multiple instances of SISTR on the same machine, starting all of them at the exact same time, they can interfere with each other's results.
For example, running:
Will produce the following in the stderr files:
This error does not occur if only running one instance at a time. I'm guessing each instance is interfering with each other's
tmp
files.