kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

support caching for previously computed datasets #291

Open levinas opened 9 years ago

levinas commented 9 years ago

Dan has this idea that we should check if the service has seen an input and just return the previously computed output. This would reduce the computation burden.

We could check for file size, file MD5, assembly method, and arast version to determine if we could serve a precomputed result. Not sure if shock computes MD5 by default.

It would mean providing a --force option for rerunning the assembly.

levinas commented 9 years ago

We may need to look into data caching as well. --data is great, but we need something equivalent for the jobs invoked in the narrative. I'm seeing some 25GB SRA reads being pulled over and over again from shock. Maybe handle ID could be another key for such caching.