How to reset cache of only the workflow and not the computation?

olgabot commented 6 years ago

Hello, I've found that if I edit the workflow file for a reflow runbatch job, the information doesn't get propagated unless I add -cache=off. Is there a way to reset the cache ONLY for the workflow file and not for the intermediate computation?

Context: I made the memory requirements too high (64GiB) and now am launching very expensive instances, so I lowered it down to 8GiB but reflow is still allocating 64GiB:

 ✘  Fri 29 Jun - 22:28  ~/kmer-hashing/sourmash/maca/facs_v5_1000cell_dna-only_scaled_trim_comparison   origin ☊ master 1● 
 ubuntu@olgabot-reflow-t2  reflow list
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-18-236-191-110.us-west-2.compute.amazonaws.com:9000/8b503eb1dd00a200
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-54-185-166-3.us-west-2.compute.amazonaws.com:9000/6726dab57e8a506f
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/6b9953a11ddfea27
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/021a7faffd13ee9b
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/2053c28757c34e12
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-35-164-200-123.us-west-2.compute.amazonaws.com:9000/bf6a84bb7bea6476
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/e185edbd51dfea51
    64.0GiB 2   1.0GiB -2m53s Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/c3ee6bc574677397
    64.0GiB 2   1.0GiB -2m0s  Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/dc841df2f64a0e0f
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/697d12a015741b2c
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/81966224adec9aff
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/caa9a587d6c0a310
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/82b8a143c1791958
    64.0GiB 2   1.0GiB -3m4s  Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/287bd5004844992d
    64.0GiB 2   1.0GiB -3m4s  Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/ec85257fde35fc78
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-52-36-173-2.us-west-2.compute.amazonaws.com:9000/7a08ce0a40183ae8
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-52-24-213-163.us-west-2.compute.amazonaws.com:9000/636e2fbcb3298069
    64.0GiB 2   1.0GiB -3m5s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/5bb491fc86ddb540
    64.0GiB 2   1.0GiB -3m6s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/ea060255331b4e97
    64.0GiB 2   1.0GiB -3m6s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/47948e6686a3d843

Thanks! Olga

mariusae commented 6 years ago

What's probably happening here is that the new reflow runbatch is attaching itself to the previous jobs -- Reflow restores state when it can. You should find that new samples get smaller instance types.

Alternatively, you can nuke the current batch state by running reflow runbatch -reset instead. This will start the job anew, and each sample should start processing from the start (but reusing data where it can, of course).

This is all too confusing: the UX around the tooling here needs to improve; it's one of our near-term goals.

olgabot commented 6 years ago

Turns out I had @requires(mem := 64*GiB) on Main which was the problem -_-

grailbio / reflow

How to reset cache of only the workflow and not the computation? #51