samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
285 stars 244 forks source link

Allow user to indicate sra cache location AND delete cache files after they're used #718

Open lmichael107 opened 8 years ago

lmichael107 commented 8 years ago

At UW-Madison's Center for High Throughput Computing, our users have important reasons to not have file written to their home directories. In particular, when sending jobs to the Open Science Grid (a national grid of immense backfill campus computing capacity), software runs within a temporary directory on a remote server where the user's home directory doesn't exist. In a different context, even on some of our campus machines, a user may have a home directory on one of our worker servers, but this home directory is not the same as on the submission node, and the cache files are left to accumulate on the worker servers. It would be nice to allow the user to at least indicate where such files should go (the working directory or /tmp, for example) and/or to delete cache files after they've served their purpose.

Expected behaviour

Would like to give an option for where the sra cache should go, instead of going to ~/ncbi/public/sra/

Actual behaviour

Cache files are always written to ~/ncbi/public/sra, which causes various problems in various contexts, but especially on shared cluster and backfill computing systems and on some network filesystem configurations.

cmnbroad commented 8 years ago

@a-nikitiuk Can you take a look/respond ? thx.