grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

How to reset cache of only the workflow and not the computation? #51

Closed olgabot closed 6 years ago

olgabot commented 6 years ago

Hello, I've found that if I edit the workflow file for a reflow runbatch job, the information doesn't get propagated unless I add -cache=off. Is there a way to reset the cache ONLY for the workflow file and not for the intermediate computation?

Context: I made the memory requirements too high (64GiB) and now am launching very expensive instances, so I lowered it down to 8GiB but reflow is still allocating 64GiB:

 ✘  Fri 29 Jun - 22:28  ~/kmer-hashing/sourmash/maca/facs_v5_1000cell_dna-only_scaled_trim_comparison   origin ☊ master 1● 
 ubuntu@olgabot-reflow-t2  reflow list
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-18-236-191-110.us-west-2.compute.amazonaws.com:9000/8b503eb1dd00a200
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-54-185-166-3.us-west-2.compute.amazonaws.com:9000/6726dab57e8a506f
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/6b9953a11ddfea27
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/021a7faffd13ee9b
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-3-86.us-west-2.compute.amazonaws.com:9000/2053c28757c34e12
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-35-164-200-123.us-west-2.compute.amazonaws.com:9000/bf6a84bb7bea6476
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/e185edbd51dfea51
    64.0GiB 2   1.0GiB -2m53s Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/c3ee6bc574677397
    64.0GiB 2   1.0GiB -2m0s  Ubuntu <ubuntu> ec2-54-187-152-112.us-west-2.compute.amazonaws.com:9000/dc841df2f64a0e0f
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/697d12a015741b2c
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/81966224adec9aff
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-18-237-123-129.us-west-2.compute.amazonaws.com:9000/caa9a587d6c0a310
    64.0GiB 2   1.0GiB -1m56s Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/82b8a143c1791958
    64.0GiB 2   1.0GiB -3m4s  Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/287bd5004844992d
    64.0GiB 2   1.0GiB -3m4s  Ubuntu <ubuntu> ec2-34-221-115-69.us-west-2.compute.amazonaws.com:9000/ec85257fde35fc78
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-52-36-173-2.us-west-2.compute.amazonaws.com:9000/7a08ce0a40183ae8
    64.0GiB 2   1.0GiB -1m57s Ubuntu <ubuntu> ec2-52-24-213-163.us-west-2.compute.amazonaws.com:9000/636e2fbcb3298069
    64.0GiB 2   1.0GiB -3m5s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/5bb491fc86ddb540
    64.0GiB 2   1.0GiB -3m6s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/ea060255331b4e97
    64.0GiB 2   1.0GiB -3m6s  Ubuntu <ubuntu> ec2-18-237-209-173.us-west-2.compute.amazonaws.com:9000/47948e6686a3d843

Thanks! Olga

mariusae commented 6 years ago

What's probably happening here is that the new reflow runbatch is attaching itself to the previous jobs -- Reflow restores state when it can. You should find that new samples get smaller instance types.

Alternatively, you can nuke the current batch state by running reflow runbatch -reset instead. This will start the job anew, and each sample should start processing from the start (but reusing data where it can, of course).

This is all too confusing: the UX around the tooling here needs to improve; it's one of our near-term goals.

olgabot commented 6 years ago

Turns out I had @requires(mem := 64*GiB) on Main which was the problem -_-