I would like to run spacedust on a plasmid database. This database has ~60k individual files that represent separate plasmid "genomes". However when I pass the following command to spacedust:
$spacedust createsetdb /individual_faa/*.faa SpacedustDB tmp --threads 18
bash: /shared/software/bin/spacedust: Argument list too long
I receive a bash error that the arguments list is too long. I have tried a number of workarounds to this such as passing an environment variable that contains all the file names...but to no avail
It would be useful if instead of passing a file glob (*), that spacedust createsetdb could instead take a single input file with paths to each of the .faa files needed for db creation. Alternatively if I could create databases in batches and combine them that could be another approach, just not sure if that is supported. Finally, if you have any other suggestions I would be forever greatful.
In terms of the total number of proteins in these plasmid "genomes" it would be quite similar to the 9000 genomes you ran in the spacedust paper since plasmids are much smaller in size. So I think computationally it should be managable just trouble getting all the files in :-)
My Environment
Linux
Using Statically compiled spacedust executable for AVX2 instruction set
I would like to run spacedust on a plasmid database. This database has ~60k individual files that represent separate plasmid "genomes". However when I pass the following command to spacedust:
I receive a bash error that the arguments list is too long. I have tried a number of workarounds to this such as passing an environment variable that contains all the file names...but to no avail
It would be useful if instead of passing a file glob (*), that spacedust createsetdb could instead take a single input file with paths to each of the .faa files needed for db creation. Alternatively if I could create databases in batches and combine them that could be another approach, just not sure if that is supported. Finally, if you have any other suggestions I would be forever greatful.
In terms of the total number of proteins in these plasmid "genomes" it would be quite similar to the 9000 genomes you ran in the spacedust paper since plasmids are much smaller in size. So I think computationally it should be managable just trouble getting all the files in :-)
My Environment