soedinglab / plass

sensitive and precise assembly of short sequencing reads
https://plass.mmseqs.com
GNU General Public License v3.0
132 stars 14 forks source link

Running plass in parallel on same 'tmp' folder #9

Closed mhibberd closed 5 years ago

mhibberd commented 5 years ago

I'm invoking plass in the typical form on our high-throughput computing (slurm/sbatch) array:

plass assembly <reads_R1.fq> <reads_R2.fq> <plass_assembly.faa> tmp

Running on a single node and referencing the node-specific /tmp directory works fine, but in our system not all nodes have the required large amount of tmp space necessary (as in #4 ). Directing plass to use a local tmp folder works fine and has a much larger space allotment.

However, when I take advantage of the parallelization of slurm to run multiple jobs at once, the tmp folder I specify looks like this:

11610453234058486865/
13117816727409383803/
2330489238614308671/
latest -> 2330489238614308671/

Am i correct in envisioning issues with the symlinking approach here? It looks like the multiple tasks might get confused with the "current" symlink as they iterate along.

Thoughts?

Thanks~

milot-mirdita commented 5 years ago

Internally Plass always uses the integer hash, the latest symlink is for convenience for the user.

mhibberd commented 5 years ago

Awesome. Thanks for clarifying!