grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

Race condition on reflow.Flow objects and especially on values.Map #28

Closed hchauvin closed 6 years ago

hchauvin commented 6 years ago

Running on HEAD (with go 1.10), I have a race condition that makes reflow panic:

reflow[18774]: fatal error: concurrent map read and map write
reflow[18774]: goroutine 8529 [running]:
reflow[18774]: runtime.throw(0xf38b72, 0x21)
reflow[18774]:         GOROOT/src/runtime/panic.go:619 +0x81 fp=0xc422145160 sp=0xc422145140 pc=0x42ceb1
reflow[18774]: runtime.mapaccess1(0xe30e00, 0xc42188e000, 0xc422145250, 0x3735435661735303)
reflow[18774]:         GOROOT/src/runtime/hashmap.go:357 +0x216 fp=0xc4221451a8 sp=0xc422145160 pc=0x409906
reflow[18774]: github.com/grailbio/reflow/values.Map.Insert(0xc42188e000, 0x5, 0xe79e47dddf990c05, 0xc37dcf7f0fb19596, 0x71771d3db9d58c4b, 0x445b0ebe3f299f29, 0x0, 0x0, 0x0, 0x0, ...)
reflow[18774]:         values/values.go:79 +0x8f fp=0xc4221453f0 sp=0xc4221451a8 pc=0x82914f
reflow[18774]: github.com/grailbio/reflow/syntax.Force.func4()
reflow[18774]:         syntax/force.go:83 +0x13d fp=0xc4221454f0 sp=0xc4221453f0 pc=0x90861d
reflow[18774]: github.com/grailbio/reflow/syntax.(*resolver).Resolve.func1(0xc421ce6000, 0x120, 0x120, 0xdae380)
reflow[18774]:         syntax/force.go:197 +0xc6 fp=0xc422145528 sp=0xc4221454f0 pc=0x908996
reflow[18774]: github.com/grailbio/reflow.(*Flow).canonicalize.func2(0xc421ce6000, 0x120, 0x120, 0xc421ce6000)
reflow[18774]:         flow.go:839 +0x57 fp=0xc422145568 sp=0xc422145528 pc=0x85ae57
reflow[18774]: github.com/grailbio/reflow.(*Eval).eval(0xc4200a8480, 0xff7c20, 0xc420028a40, 0xc420ba6700, 0x0, 0x0)
reflow[18774]:         eval.go:1067 +0x14e7 fp=0xc422145f70 sp=0xc422145568 pc=0x842187
reflow[18774]: github.com/grailbio/reflow.(*Eval).Do.func4(0xc4200a8480, 0xc420afa160, 0xc420ba6700)
reflow[18774]:         eval.go:459 +0x4f fp=0xc422145fc8 sp=0xc422145f70 pc=0x85a26f
reflow[18774]: runtime.goexit()
reflow[18774]:         bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc422145fd0 sp=0xc422145fc8 pc=0x45c5f1
reflow[18774]: created by github.com/grailbio/reflow.(*Eval).Do
reflow[18774]:         eval.go:458 +0x4d5
reflow[18774]: goroutine 1 [select]:
reflow[18774]: github.com/grailbio/reflow.(*Eval).wait(0xc4200a8480, 0xff7c20, 0xc420028a40, 0x0, 0x0)
reflow[18774]:         eval.go:689 +0x7fb
reflow[18774]: github.com/grailbio/reflow.(*Eval).Do(0xc4200a8480, 0xff7c20, 0xc42029ba80, 0x0, 0x0)
reflow[18774]:         eval.go:484 +0xb46
reflow[18774]: github.com/grailbio/reflow/runner.(*Runner).Eval(0xc4200bca00, 0xff7ce0, 0xc42069cf90, 0x5f, 0x0, 0x0, 0xc420c6695a)
reflow[18774]:         runner/runner.go:287 +0x39e
reflow[18774]: github.com/grailbio/reflow/runner.(*Runner).Do(0xc4200bca00, 0xff7ce0, 0xc42069cf90, 0x0)
reflow[18774]:         runner/runner.go:204 +0x38f
reflow[18774]: github.com/grailbio/reflow/tool.(*Cmd).run(0xc420474000, 0xff7ce0, 0xc42069cf90, 0xc42001e0b0, 0x1, 0x1)
reflow[18774]:         tool/run.go:305 +0x12a9
reflow[18774]: github.com/grailbio/reflow/tool.(*Cmd).Main(0xc420474000)
reflow[18774]:         tool/main.go:293 +0x9fb
reflow[18774]: main.main()
reflow[18774]:         cmd/reflow/main.go:62 +0x3fc
reflow[18774]: goroutine 5 [syscall, 2 minutes]:
reflow[18774]: os/signal.signal_recv(0x0)
reflow[18774]:         GOROOT/src/runtime/sigqueue.go:139 +0xa6
reflow[18774]: os/signal.loop()
reflow[18774]:         GOROOT/src/os/signal/signal_unix.go:22 +0x22
reflow[18774]: created by os/signal.init.0
reflow[18774]:         GOROOT/src/os/signal/signal_unix.go:28 +0x41
reflow[18774]: goroutine 18 [select]:
reflow[18774]: github.com/grailbio/reflow/internal/status.Reporter.displaySimple(0xc42029c060, 0xff0b60, 0xc42000e020, 0xc42028fea0)
reflow[18774]:         internal/status/stream.go:275 +0x20b
reflow[18774]: github.com/grailbio/reflow/internal/status.Reporter.Go(0xc42029c060, 0xff0b60, 0xc42000e020, 0xc42028fea0)
reflow[18774]:         internal/status/stream.go:83 +0x74
reflow[18774]: created by github.com/grailbio/reflow/tool.(*Cmd).Main
reflow[18774]:         tool/main.go:222 +0x322
reflow[18774]: goroutine 24 [select, 2 minutes, locked to thread]:
reflow[18774]: runtime.gopark(0xf52520, 0x0, 0xf1d0a8, 0x6, 0x18, 0x1)
reflow[18774]:         GOROOT/src/runtime/proc.go:291 +0x11a
reflow[18774]: runtime.selectgo(0xc4204a7750, 0xc42029c120)
reflow[18774]:         GOROOT/src/runtime/select.go:392 +0xe50
reflow[18774]: runtime.ensureSigM.func1()
reflow[18774]:         GOROOT/src/runtime/signal_unix.go:549 +0x1f4
reflow[18774]: runtime.goexit()
reflow[18774]:         bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/asm_amd64.s:2361 +0x1
reflow[18774]: goroutine 7 [chan receive, 2 minutes]:
reflow[18774]: github.com/grailbio/reflow/tool.(*Cmd).Main.func2(0xc42027a360, 0xc420280d60, 0xc420474000)

Unfortunately I did not compile reflow for this run with '-race' to have more info on the race condition, but when I do, I find such race conditions in other parts of the code. There is no panic here, as no map is involved. For instance:

WARNING: DATA RACE
Read at 0x00c420944a90 by main goroutine:
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:918 +0x94
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).todo()
      eval.go:919 +0xe8
  github.com/grailbio/reflow.(*Eval).Do()
      eval.go:380 +0x126a
  github.com/grailbio/reflow/runner.(*Runner).Eval()
      runner/runner.go:287 +0x58e
  github.com/grailbio/reflow/runner.(*Runner).Do()
      runner/runner.go:204 +0x63f
  github.com/grailbio/reflow/tool.(*Cmd).run()
      tool/run.go:310 +0x2512
  github.com/grailbio/reflow/tool.(*Cmd).Main()
      tool/main.go:293 +0x10ee
  main.main()
      cmd/reflow/main.go:62 +0x5de

Previous write at 0x00c420944a90 by goroutine 671:
  github.com/grailbio/reflow.(*Flow).Fork()
      flow.go:366 +0xcc
  github.com/grailbio/reflow.(*Eval).Mutate()
      eval.go:1494 +0x776
  github.com/grailbio/reflow.(*Eval).eval()
      eval.go:1068 +0x22a8
  github.com/grailbio/reflow.(*Eval).Do.func4()
      eval.go:459 +0x6a

Goroutine 671 (running) created at:
  github.com/grailbio/reflow.(*Eval).Do()
      eval.go:458 +0x7fe

In either case, it might be because there is no lock on the reflow.Flow structure, but I might be mistaken.

mariusae commented 6 years ago

Hi @hchauvin, thanks for your report! The data race in Flow is by design and is safe (happy to expand on that, but the gist is that the updates to forks are done such that happens-before relationships are established outside of the use of data synchronization, and the writes that are racy are all references.)

However, the first panic is real! I've never seen this. I've looked at the code paths, and I might have some ideas. Can you share the context that you're running this in? Is it a single run? A batch? Is this reproducible or did you see it only once?

Thanks.

hchauvin commented 6 years ago

Hi,

Thank you for such a quick reply!

It is a single run, but quite a large one (if by batch you mean your batch mode, because this is a definitely a batch-like situation). I tried to reproduce it locally with a "curtailed optimization" mode where I did not optimize fully, but I couldn't get it, probably because there is some rate limiting. The only way I can get it is if I do the whole run remotely on 368 cores on an empty cache. There I get it every single time. Here is the run I just did:

ec2cluster: 8 instances: c5.18xlarge:5,r4.xlarge:1,t2.large:1,t2.medium:1 (<=$17.7/hr), total{mem:709.1GiB cpu:368 disk:2.0TiB intel_avx:368 intel_avx2:364 intel_avx512:360}, waiti
2fe6c367: elapsed: 9m30s, running:3, completed: 732/773
  XXX:            exec done 65.9KiB                                                                                           7m50s
  XXX:            exec done 125.0KiB                                                                                          7m19s
  XXX:            exec done 129.0KiB                                                                                          7m12s
  XXX:            exec done 126.7KiB                                                                                          7m12s
  XXX:  exec ..82de2c015bb10f63b87b8cc41247aecd337304 --output_file={{out}}.--input..op_name=WTC57.--dry_run=FALSE  6s
transfers: done: 4529 1.9GiB, transferring: 0 0B, waiting: 0 0B
  ec2-XXX...->ec2-XXX...:  done: 1 126.7KiB, transferring: 0 0B, waiting: 0 0B  7s
  ec2-XXX...->s3:XXX:    done: 1 126.7KiB, transferring: 0 0B, waiting: 0 0B  7s
  ec2-XXX...->ec2-XXX...:  done: 1 129.0KiB, transferring: 0 0B, waiting: 0 0B  6s
  ec2-XXX...->s3:XXX:    done: 1 129.0KiB, transferring: 0 0B, waiting: 0 0B  6s
  ec2-XXX.eu..->ec2-XXX...:  done: 1 65.9KiB, transferring: 0 0B, waiting: 0 0B   1s
  ec2-XXX.eu..->s3:XXX:    done: 1 65.9KiB, transferring: 0 0B, waiting: 0 0B   1s
  ec2-XXX.eu..->ec2-XXX...:  done: 1 125.0KiB, transferring: 0 0B, waiting: 0 0B  0s
fatal error: concurrent map read and map write

goroutine 52570 [running]:
runtime.throw(0xf38c2b, 0x21)
    GOROOT/src/runtime/panic.go:619 +0x81 fp=0xc423707160 sp=0xc423707140 pc=0x42ceb1
runtime.mapaccess1(0xe30ec0, 0xc420f66ab0, 0xc423707370, 0xc4239e3b30)
    GOROOT/src/runtime/hashmap.go:357 +0x216 fp=0xc4237071a8 sp=0xc423707160 pc=0x409906
github.com/grailbio/reflow/values.Map.Insert(0xc420f66ab0, 0x5, 0x6ccea30c1a89f402, 0x577dcc3c9f33ab05, 0xc2de70a08e9e1f64, 0x326543abb206dd76, 0x0, 0x0, 0x0, 0x0, ...)
    values/values.go:81 +0x196 fp=0xc4237073f0 sp=0xc4237071a8 pc=0x829256
github.com/grailbio/reflow/syntax.Force.func4()
    syntax/force.go:83 +0x13d fp=0xc4237074f0 sp=0xc4237073f0 pc=0x90861d
github.com/grailbio/reflow/syntax.(*resolver).Resolve.func1(0xc422556000, 0x120, 0x120, 0xdae440)
    syntax/force.go:197 +0xc6 fp=0xc423707528 sp=0xc4237074f0 pc=0x908996
github.com/grailbio/reflow.(*Flow).canonicalize.func2(0xc422556000, 0x120, 0x120, 0xc422556000)
    flow.go:839 +0x57 fp=0xc423707568 sp=0xc423707528 pc=0x85ae57
github.com/grailbio/reflow.(*Eval).eval(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0xc420ea3c00, 0x0, 0x0)
    eval.go:1067 +0x14e7 fp=0xc423707f70 sp=0xc423707568 pc=0x842187
github.com/grailbio/reflow.(*Eval).Do.func4(0xc4202b7200, 0xc42074e250, 0xc420ea3c00)
    eval.go:459 +0x4f fp=0xc423707fc8 sp=0xc423707f70 pc=0x85a26f
runtime.goexit()
    bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc423707fd0 sp=0xc423707fc8 pc=0x45c5f1
created by github.com/grailbio/reflow.(*Eval).Do
    eval.go:458 +0x4d5

goroutine 1 [runnable]:
github.com/grailbio/reflow.(*FlowVisitor).Walk(0xc420d60848, 0xc423df8af8)
    flow.go:875 +0xe4
github.com/grailbio/reflow.(*Eval).wait(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0x0, 0x0)
    eval.go:656 +0x233
github.com/grailbio/reflow.(*Eval).Do(0xc4202b7200, 0xff7d60, 0xc420bf66c0, 0x0, 0x0)
    eval.go:484 +0xb46
github.com/grailbio/reflow/runner.(*Runner).Eval(0xc4201b0000, 0xff7e20, 0xc4201bd5c0, 0x60, 0x0, 0x0, 0xc420e8b67b)
    runner/runner.go:287 +0x39e
github.com/grailbio/reflow/runner.(*Runner).Do(0xc4201b0000, 0xff7e20, 0xc4201bd5c0, 0x0)
    runner/runner.go:204 +0x38f
github.com/grailbio/reflow/tool.(*Cmd).run(0xc42047a000, 0xff7e20, 0xc4201bd5c0, 0xc42001e120, 0x2, 0x2)
    tool/run.go:310 +0x12c8
github.com/grailbio/reflow/tool.(*Cmd).Main(0xc42047a000)
    tool/main.go:293 +0x9fb
main.main()
    cmd/reflow/main.go:62 +0x3fc

goroutine 5 [syscall, 10 minutes]:
os/signal.signal_recv(0x0)
    GOROOT/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
    GOROOT/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
    GOROOT/src/os/signal/signal_unix.go:28 +0x41

goroutine 18 [select]:
github.com/grailbio/reflow/internal/status.Reporter.displayTerm(0xc4202a2060, 0xff0ca0, 0xc42000e020, 0xc420065be0, 0xc420295ea0)
    internal/status/stream.go:118 +0x44b
github.com/grailbio/reflow/internal/status.Reporter.Go(0xc4202a2060, 0xff0ca0, 0xc42000e020, 0xc420295ea0)
    internal/status/stream.go:81 +0xaf
created by github.com/grailbio/reflow/tool.(*Cmd).Main
    tool/main.go:222 +0x322

goroutine 21 [select, 10 minutes, locked to thread]:
runtime.gopark(0xf52668, 0x0, 0xf1d188, 0x6, 0x18, 0x1)
    GOROOT/src/runtime/proc.go:291 +0x11a
runtime.selectgo(0xc42004bf50, 0xc4202a2180)
    GOROOT/src/runtime/select.go:392 +0xe50
runtime.ensureSigM.func1()
    GOROOT/src/runtime/signal_unix.go:549 +0x1f4
runtime.goexit()
    bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/src/runtime/asm_amd64.s:2361 +0x1

goroutine 53 [chan receive, 10 minutes]:
github.com/grailbio/reflow/tool.(*Cmd).Main.func2(0xc420280420, 0xc420287020, 0xc42047a000)
    tool/main.go:283 +0x4c
created by github.com/grailbio/reflow/tool.(*Cmd).Main
    tool/main.go:282 +0x983

goroutine 68 [select]:
github.com/grailbio/reflow/ec2cluster.(*Cluster).maintain(0xc420e9a1e0)
    ec2cluster/ec2cluster.go:474 +0x189
created by github.com/grailbio/reflow/ec2cluster.(*Cluster).Init
    ec2cluster/ec2cluster.go:183 +0x54c

goroutine 187 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc702f00, 0x72, 0xc420fbc858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420eb6f98, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420eb6f98, 0xc422404000, 0x8000, 0x8000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420eb6f80, 0xc422404000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc420eb6f80, 0xc422404000, 0x8000, 0x8000, 0xc420fbca38, 0xb23792, 0xe80720)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420f60220, 0xc422404000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc420282f90, 0x7f3db80bb000, 0xc420f60220, 0x5, 0xc420f60220, 0x17)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc4208b1500, 0xf52717, 0xc4208b1620, 0x4)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc4208b1500, 0xc420fb7000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc42099bc80, 0xc420689458, 0x9, 0x9, 0x2, 0xc420fbcc78, 0x405d75)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc42099bc80, 0xc420689458, 0x9, 0x9, 0x9, 0xc42087c9e0, 0xc420fbcd38, 0x405aa7)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc42099bc80, 0xc420689458, 0x9, 0x9, 0x4701e3, 0xc420fbcd18, 0xc420fbcd48)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420689458, 0x9, 0x9, 0xfeff60, 0xc42099bc80, 0x0, 0x0, 0xc424288601, 0xc420fbce00)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420689420, 0xc42a7c6510, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc420fbcfb8, 0xf51a90, 0xc4204b37b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc420ead500)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 69 [select]:
github.com/grailbio/reflow/ec2cluster.(*Cluster).loop(0xc420e9a1e0)
    ec2cluster/ec2cluster.go:416 +0x15f7
created by github.com/grailbio/reflow/ec2cluster.(*Cluster).Init
    ec2cluster/ec2cluster.go:184 +0x571

goroutine 31857 [IO wait]:
internal/poll.runtime_pollWait(0x7f3db857eea8, 0x72, 0xc421840858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4236ac718, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc4236ac718, 0xc421694000, 0x2000, 0x2000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4236ac700, 0xc421694000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc4236ac700, 0xc421694000, 0x2000, 0x2000, 0xc421840a38, 0xb23792, 0xe80720)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420ef82a0, 0xc421694000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc4231f8c30, 0x7f3db80bb000, 0xc420ef82a0, 0x5, 0xc420ef82a0, 0x100000000f1a8de)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc423d9bc00, 0xf52717, 0xc423d9bd20, 0x4)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc423d9bc00, 0xc421d8d000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc421fe74a0, 0xc420de8ff8, 0x9, 0x9, 0x2, 0xc421840c78, 0x405d75)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc421fe74a0, 0xc420de8ff8, 0x9, 0x9, 0x9, 0x413c87, 0xc421840d08, 0xc421840d08)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc421fe74a0, 0xc420de8ff8, 0x9, 0x9, 0x4701e3, 0xc421840d18, 0xc421840d48)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420de8ff8, 0x9, 0x9, 0xfeff60, 0xc421fe74a0, 0x0, 0x100000000, 0xc421610500, 0xc421840e00)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420de8fc0, 0xc42357e210, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc421840fb8, 0xf51a90, 0xc4204b37b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc423bf6e00)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 131 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc702d60, 0x72, 0xc420fc3858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420b5f818, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420b5f818, 0xc420b50000, 0x8000, 0x8000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420b5f800, 0xc420b50000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc420b5f800, 0xc420b50000, 0x8000, 0x8000, 0x8, 0x8, 0x7ff3)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420a805f0, 0xc420b50000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc420dae870, 0x7f3db80bb000, 0xc420a805f0, 0x5, 0xc420a805f0, 0x0)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc4208b0000, 0xf52717, 0xc4208b0120, 0xc423e9a5c0)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc4208b0000, 0xc420fdb000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc421090ea0, 0xc4206890d8, 0x9, 0x9, 0xc420fc3c50, 0x470183, 0xc420fc3c60)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc421090ea0, 0xc4206890d8, 0x9, 0x9, 0x9, 0xc420932101, 0xc420fc3d00, 0xb1e57b)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc421090ea0, 0xc4206890d8, 0x9, 0x9, 0xc420064060, 0xc420fc3d30, 0xb1e48d)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc4206890d8, 0x9, 0x9, 0xfeff60, 0xc421090ea0, 0x0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc4206890a0, 0xc422c6ddd0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc420fc3fb8, 0xf51a90, 0xc420fa1fb8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc42107b880)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 211 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc420bf66c0, 0xc420cb6cf0, 0xfff2c0, 0xc4208d6800, 0x0, 0x0)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Runner).Eval.func1(0xc4209fdf20, 0xc4201b0000, 0xc42000f018, 0xc420cbb470)
    runner/runner.go:266 +0x64
created by github.com/grailbio/reflow/runner.(*Runner).Eval
    runner/runner.go:265 +0x16c

goroutine 212 [select]:
github.com/grailbio/reflow/runner.(*Stealer).Go(0xc420a237e0, 0xff7d60, 0xc420bf66c0, 0xc4202b7200)
    runner/stealer.go:41 +0x14e
created by github.com/grailbio/reflow/runner.(*Runner).Eval
    runner/runner.go:286 +0x374

goroutine 51963 [select]:
net/http.(*persistConn).writeLoop(0xc42af1e6c0)
    GOROOT/src/net/http/transport.go:1822 +0x14b
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1238 +0x97f

goroutine 46016 [select, 1 minutes]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc420d9ab40, 0xc420cb6cf0, 0xfff2c0, 0xc423338a20, 0xc421bb3620, 0xc421bb3530)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc420d9ab40, 0xc420a237e0, 0xfff2c0, 0xc423338a20, 0xc4209fc7d0)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 10127 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc422058a00, 0xc420cb6cf0, 0xfff2c0, 0xc4207c5560, 0xc42292cf00, 0x0)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc422058a00, 0xc420a237e0, 0xfff2c0, 0xc4207c5560, 0xc420c7b870)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 52455 [select]:
net/http.(*persistConn).roundTrip(0xc42af1e6c0, 0xc424202360, 0x0, 0x0, 0x0)
    GOROOT/src/net/http/transport.go:2033 +0x5a7
net/http.(*Transport).RoundTrip(0x1534400, 0xc420a4e200, 0x1534400, 0x0, 0x0)
    GOROOT/src/net/http/transport.go:422 +0x8f2
net/http.send(0xc420a4e200, 0xff09a0, 0x1534400, 0x0, 0x0, 0x0, 0xc420a802a0, 0xc420772230, 0xc423709750, 0x1)
    GOROOT/src/net/http/client.go:252 +0x185
net/http.(*Client).send(0x1586f80, 0xc420a4e200, 0x0, 0x0, 0x0, 0xc420a802a0, 0x0, 0x1, 0x280)
    GOROOT/src/net/http/client.go:176 +0xfa
net/http.(*Client).Do(0x1586f80, 0xc420a4e200, 0xc423709930, 0x10, 0xc423709928)
    GOROOT/src/net/http/client.go:615 +0x28d
github.com/aws/aws-sdk-go/aws/corehandlers.sendFollowRedirects(0xc4212e8800, 0xf51318, 0xc421ed5800, 0xc4212e8800)
    external/com_github_aws_aws_sdk_go/aws/corehandlers/handlers.go:120 +0x3b
github.com/aws/aws-sdk-go/aws/corehandlers.glob..func3(0xc4212e8800)
    external/com_github_aws_aws_sdk_go/aws/corehandlers/handlers.go:112 +0x87
github.com/aws/aws-sdk-go/aws/request.(*HandlerList).Run(0xc4212e8998, 0xc4212e8800)
    external/com_github_aws_aws_sdk_go/aws/request/handlers.go:195 +0x9d
github.com/aws/aws-sdk-go/aws/request.(*Request).Send(0xc4212e8800, 0x0, 0x0)
    external/com_github_aws_aws_sdk_go/aws/request/request.go:490 +0x17f
github.com/aws/aws-sdk-go/service/s3.(*S3).HeadObject(0xc420f60078, 0xc421032150, 0x3, 0xc420d8a4b0, 0x4f)
    external/com_github_aws_aws_sdk_go/service/s3/api.go:3345 +0x4d
github.com/grailbio/reflow/repository/s3.(*Repository).Stat(0xc4201bd470, 0xff7de0, 0xc420f04600, 0x5, 0x7e92c80b750d5970, 0xe474d458975503bc, 0xd8be9b47929a3c92, 0x6285615487cbab51, 0x0, 0x0, ...)
    repository/s3/s3.go:65 +0x235
github.com/grailbio/reflow/repository.(*Manager).NeedTransfer.func1(0xc400000008, 0xf527c8)
    repository/manager.go:210 +0x113
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc421690700, 0xc421096000)
    external/org_golang_x_sync/errgroup/errgroup.go:58 +0x57
created by golang.org/x/sync/errgroup.(*Group).Go
    external/org_golang_x_sync/errgroup/errgroup.go:55 +0x66

goroutine 46053 [select]:
net/http.(*persistConn).writeLoop(0xc4214605a0)
    GOROOT/src/net/http/transport.go:1822 +0x14b
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1238 +0x97f

goroutine 52565 [semacquire]:
sync.runtime_Semacquire(0xc421690714)
    GOROOT/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc421690708)
    GOROOT/src/sync/waitgroup.go:129 +0x72
golang.org/x/sync/errgroup.(*Group).Wait(0xc421690700, 0xc421096000, 0xc4216903c0)
    external/org_golang_x_sync/errgroup/errgroup.go:41 +0x31
github.com/grailbio/reflow/repository.(*Manager).NeedTransfer(0xc420b5f200, 0xff8260, 0xc421a50100, 0xffbe60, 0xc4201bd470, 0xc420d8a3c0, 0x1, 0x1, 0x0, 0x0, ...)
    repository/manager.go:220 +0x3f9
github.com/grailbio/reflow/repository.(*Manager).Transfer(0xc420b5f200, 0xff8260, 0xc421a50100, 0xffbe60, 0xc4201bd470, 0xffbda0, 0xc421a50080, 0xc420d8a3c0, 0x1, 0x1, ...)
    repository/manager.go:188 +0x8a
github.com/grailbio/reflow.(*Eval).CacheWrite(0xc4202b7200, 0xff8260, 0xc421a50100, 0xc42257d500, 0xffbda0, 0xc421a50080, 0xc423d94240, 0x20002)
    eval.go:1138 +0x227
github.com/grailbio/reflow/runner.(*worker).do.func2(0xc42b21a2a0, 0xff7d60, 0xc423df4340, 0xff32a0, 0xc422e4a460, 0xc42257d500)
    runner/worker.go:240 +0x10d
created by github.com/grailbio/reflow/runner.(*worker).do
    runner/worker.go:239 +0x8bd

goroutine 50993 [select]:
net/http.(*persistConn).writeLoop(0xc42262ed80)
    GOROOT/src/net/http/transport.go:1822 +0x14b
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1238 +0x97f

goroutine 52491 [select]:
golang.org/x/net/http2.(*ClientConn).roundTrip(0xc420ead500, 0xc420a4f500, 0x0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/transport.go:891 +0x844
golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc42028b900, 0xc420a4f500, 0x0, 0xc423b6e8d8, 0x40c780, 0xf1b8e2)
    external/org_golang_x_net/http2/transport.go:378 +0x150
golang.org/x/net/http2.(*Transport).RoundTrip(0xc42028b900, 0xc420a4f500, 0xc423b6e910, 0x0, 0xc420bf6780)
    external/org_golang_x_net/http2/transport.go:340 +0x3a
golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip(0xc42028b900, 0xc420a4f500, 0xc4208921e0, 0x5, 0xc42086caa8)
    external/org_golang_x_net/http2/configure_transport.go:75 +0x39
net/http.(*Transport).RoundTrip(0xc4201d20f0, 0xc420a4f500, 0xc4201d20f0, 0x0, 0x0)
    GOROOT/src/net/http/transport.go:380 +0xc36
net/http.send(0xc420a4f500, 0xff09a0, 0xc4201d20f0, 0x0, 0x0, 0x0, 0xc42b12f728, 0xc4210b4780, 0xc423b6ed00, 0x1)
    GOROOT/src/net/http/client.go:252 +0x185
net/http.(*Client).send(0xc420cb7e60, 0xc420a4f500, 0x0, 0x0, 0x0, 0xc42b12f728, 0x0, 0x1, 0x78)
    GOROOT/src/net/http/client.go:176 +0xfa
net/http.(*Client).Do(0xc420cb7e60, 0xc420a4f500, 0xc420bf6780, 0xc420a4f500, 0x78)
    GOROOT/src/net/http/client.go:615 +0x28d
golang.org/x/net/context/ctxhttp.Do(0xff7d60, 0xc420bf6780, 0xc420cb7e60, 0xc420a4f200, 0x0, 0x0, 0xc420a4f200)
    external/org_golang_x_net/context/ctxhttp/ctxhttp.go:30 +0x6e
github.com/grailbio/reflow/rest.(*ClientCall).Do(0xc421409450, 0xff7d60, 0xc420bf6780, 0x0, 0x0, 0xc423b6f088, 0x2, 0x2)
    rest/client.go:120 +0x19a
github.com/grailbio/reflow/pool/client.(*clientExec).Wait(0xc42099b4a0, 0xff7d60, 0xc420bf6780, 0x0, 0x0)
    pool/client/client.go:315 +0x181
github.com/grailbio/reflow.(*Eval).exec(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0xc420fd7180, 0xbea44ec4d6df18f9, 0x950cce693a)
    eval.go:1383 +0x3ec
github.com/grailbio/reflow.(*Eval).eval(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0xc420fd7180, 0x0, 0x0)
    eval.go:996 +0x956
github.com/grailbio/reflow.(*Eval).Do.func4(0xc4202b7200, 0xc42074e250, 0xc420fd7180)
    eval.go:459 +0x4f
created by github.com/grailbio/reflow.(*Eval).Do
    eval.go:458 +0x4d5

goroutine 45858 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f3dbc72ab90, 0x72, 0xc423841858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc422548398, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc422548398, 0xc420907000, 0x400, 0x400)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc422548380, 0xc420907000, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc422548380, 0xc420907000, 0x400, 0x400, 0x8, 0x8, 0x3f3)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420ef8250, 0xc420907000, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc421575020, 0x7f3db80bb000, 0xc420ef8250, 0x5, 0xc420ef8250, 0x0)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc42b15e000, 0xf52717, 0xc42b15e120, 0xc42b02de00)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc42b15e000, 0xc422e46000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc421fe6c00, 0xc420918c78, 0x9, 0x9, 0xc423841c50, 0x470183, 0xc423841c60)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc421fe6c00, 0xc420918c78, 0x9, 0x9, 0x9, 0xc42acdcb01, 0xc423841d00, 0xb1e57b)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc421fe6c00, 0xc420918c78, 0x9, 0x9, 0xc420064060, 0xc423841d30, 0xb1e48d)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420918c78, 0x9, 0x9, 0xfeff60, 0xc421fe6c00, 0x0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420918c40, 0xc42291a810, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc423841fb8, 0xf51a90, 0xc4200517b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc420dc41c0)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 38017 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc423df4a80, 0xc420cb6cf0, 0xfff2c0, 0xc42228d220, 0xc422bcaba0, 0xb13bf2)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc423df4a80, 0xc420a237e0, 0xfff2c0, 0xc42228d220, 0xc4230e3b30)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 28019 [select]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc42ac9d1a0, 0xff7e20, 0xc42ac9d260)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc42418bc80, 0xc42ac9d1a0, 0xc420a237e0, 0xc422459e90, 0xfff2c0, 0xc42b0c3b80)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 46017 [select, 2 minutes]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc422415380, 0xff7e20, 0xc423764630)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc420d9ab40, 0xc422415380, 0xc420a237e0, 0xc4209fc7d0, 0xfff2c0, 0xc423338a20)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 46052 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc733340, 0x72, 0xc420fc0760)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc421050718, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc421050718, 0xc422904400, 0x400, 0x400)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc421050700, 0xc422904400, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc421050700, 0xc422904400, 0x400, 0x400, 0xc420fc0898, 0x411f26, 0x7f3dbc70dc98)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc42b12f6b8, 0xc422904400, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc422414540, 0x7f3db80bb000, 0xc42b12f6b8, 0x5, 0xc42b12f6b8, 0x5)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc42220aa80, 0xf52717, 0xc42220aba0, 0x458500)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc42220aa80, 0xc421c19000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
net/http.(*persistConn).Read(0xc4214605a0, 0xc421c19000, 0x1000, 0x1000, 0xc420fc0b98, 0x405585, 0xc4215eeea0)
    GOROOT/src/net/http/transport.go:1453 +0x136
bufio.(*Reader).fill(0xc42afaf4a0)
    GOROOT/src/bufio/bufio.go:100 +0x11e
bufio.(*Reader).Peek(0xc42afaf4a0, 0x1, 0x0, 0x0, 0x0, 0xc421596540, 0x0)
    GOROOT/src/bufio/bufio.go:132 +0x3a
net/http.(*persistConn).readLoop(0xc4214605a0)
    GOROOT/src/net/http/transport.go:1601 +0x185
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1237 +0x95a

goroutine 51962 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc6c7a38, 0x72, 0xc42391d760)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4239f0298, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc4239f0298, 0xc423bc2400, 0x400, 0x400)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4239f0280, 0xc423bc2400, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc4239f0280, 0xc423bc2400, 0x400, 0x400, 0x60b7b0, 0xc4220a3320, 0xc42b1550ce)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc423ab0020, 0xc423bc2400, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc423a81e30, 0x7f3db80bb000, 0xc423ab0020, 0x5, 0xc423ab0020, 0x1)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc42b00a380, 0xf52717, 0xc42b00a4a0, 0xc42391de58)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc42b00a380, 0xc42b155000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
net/http.(*persistConn).Read(0xc42af1e6c0, 0xc42b155000, 0x1000, 0x1000, 0x1534400, 0xc42af1e6c0, 0x0)
    GOROOT/src/net/http/transport.go:1453 +0x136
bufio.(*Reader).fill(0xc4220a3320)
    GOROOT/src/bufio/bufio.go:100 +0x11e
bufio.(*Reader).Peek(0xc4220a3320, 0x1, 0x0, 0x0, 0x0, 0xc421596360, 0x0)
    GOROOT/src/bufio/bufio.go:132 +0x3a
net/http.(*persistConn).readLoop(0xc42af1e6c0)
    GOROOT/src/net/http/transport.go:1601 +0x185
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1237 +0x95a

goroutine 7059 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc422682d80, 0xc420cb6cf0, 0xfff2c0, 0xc420b59220, 0xc4204b2f00, 0x0)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc422682d80, 0xc420a237e0, 0xfff2c0, 0xc420b59220, 0xc42360bff0)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 27757 [IO wait]:
internal/poll.runtime_pollWait(0x7f3db80fd668, 0x72, 0xc423845858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4215be098, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc4215be098, 0xc423b84000, 0x2000, 0x2000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4215be080, 0xc423b84000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc4215be080, 0xc423b84000, 0x2000, 0x2000, 0x8, 0x8, 0x1ff3)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420f60100, 0xc423b84000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc42ad3e480, 0x7f3db80bb000, 0xc420f60100, 0x5, 0xc420f60100, 0x0)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc422374e00, 0xf52717, 0xc422374f20, 0xc423e9a100)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc422374e00, 0xc421aa7000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc421738780, 0xc420b363b8, 0x9, 0x9, 0xc423845c50, 0x470183, 0xc423845c60)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc421738780, 0xc420b363b8, 0x9, 0x9, 0x9, 0xc420287101, 0xc423845d00, 0xb1e57b)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc421738780, 0xc420b363b8, 0x9, 0x9, 0xc420064060, 0xc423845d30, 0xb1e48d)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420b363b8, 0x9, 0x9, 0xfeff60, 0xc421738780, 0x0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420b36380, 0xc420f662d0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc423845fb8, 0xf51a90, 0xc420c6b7b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc42107aa80)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 31957 [select]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc423360e10, 0xff7e20, 0xc421dba000)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc421b18280, 0xc423360e10, 0xc420a237e0, 0xc424080490, 0xfff2c0, 0xc42b02c480)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 52572 [runnable]:
github.com/grailbio/reflow.(*Eval).Do.func4(0xc4202b7200, 0xc42074e250, 0xc420ecee00)
    eval.go:458
created by github.com/grailbio/reflow.(*Eval).Do
    eval.go:458 +0x4d5

goroutine 38052 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc6a4760, 0x72, 0xc423844858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc42afa6418, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc42afa6418, 0xc420f5c000, 0x2000, 0x2000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc42afa6400, 0xc420f5c000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc42afa6400, 0xc420f5c000, 0x2000, 0x2000, 0xc423844a38, 0xb23792, 0xe80720)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420a80bf8, 0xc420f5c000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc423461f80, 0x7f3db80bb000, 0xc420a80bf8, 0x5, 0xc420a80bf8, 0x100000000f1a8de)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc420b69880, 0xf52717, 0xc420b699a0, 0x4)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc420b69880, 0xc4204d4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc42209b980, 0xc420c82738, 0x9, 0x9, 0x2, 0xc423844c78, 0x405d75)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc42209b980, 0xc420c82738, 0x9, 0x9, 0x9, 0x413c87, 0xc423844d08, 0xc423844d08)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc42209b980, 0xc420c82738, 0x9, 0x9, 0x4701e3, 0xc423844d18, 0xc423844d48)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420c82738, 0x9, 0x9, 0xfeff60, 0xc42209b980, 0x0, 0x100000000, 0xc422142280, 0xc423844e00)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420c82700, 0xc42237f200, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc423844fb8, 0xf51a90, 0xc4220637b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc420dc4700)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 30009 [select]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc420665230, 0xff7e20, 0xc422e2a000)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc422059180, 0xc420665230, 0xc420a237e0, 0xc4236f0520, 0xfff2c0, 0xc424177f40)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 30008 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc422059180, 0xc420cb6cf0, 0xfff2c0, 0xc424177f40, 0x86088f, 0x1)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc422059180, 0xc420a237e0, 0xfff2c0, 0xc424177f40, 0xc4236f0520)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 34442 [select]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc42b21a2a0, 0xff7e20, 0xc4212945a0)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc4220580c0, 0xc42b21a2a0, 0xc420a237e0, 0xc420986dd0, 0xfff2c0, 0xc422a0c280)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 28018 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc42418bc80, 0xc420cb6cf0, 0xfff2c0, 0xc42b0c3b80, 0xc420c687b0, 0xc420c68730)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc42418bc80, 0xc420a237e0, 0xfff2c0, 0xc42b0c3b80, 0xc422459e90)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 7060 [select, 2 minutes]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc423658d50, 0xff7e20, 0xc423658d80)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc422682d80, 0xc423658d50, 0xc420a237e0, 0xc42360bff0, 0xfff2c0, 0xc420b59220)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 34441 [select]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc4220580c0, 0xc420cb6cf0, 0xfff2c0, 0xc422a0c280, 0x8bdc0cce503e8289, 0x2cc043410d29043b)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc4220580c0, 0xc420a237e0, 0xfff2c0, 0xc422a0c280, 0xc420986dd0)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 10128 [select, 2 minutes]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc422455ce0, 0xff7e20, 0xc422455d10)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc422058a00, 0xc422455ce0, 0xc420a237e0, 0xc420c7b870, 0xfff2c0, 0xc4207c5560)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 52569 [runnable]:
github.com/grailbio/reflow/values.Map.Insert(0xc420f66ab0, 0x5, 0xa48e782fdfd7ff0b, 0x2d1572f43c26a589, 0x9787c9358c8abbfd, 0xb6c3d22e4a0324f1, 0x0, 0x0, 0x0, 0x0, ...)
    values/values.go:85 +0x468
github.com/grailbio/reflow/syntax.Force.func4()
    syntax/force.go:83 +0x13d
github.com/grailbio/reflow/syntax.(*resolver).Resolve.func1(0xc420455300, 0x120, 0x120, 0xdae440)
    syntax/force.go:197 +0xc6
github.com/grailbio/reflow.(*Flow).canonicalize.func2(0xc420455300, 0x120, 0x120, 0xc420455300)
    flow.go:839 +0x57
github.com/grailbio/reflow.(*Eval).eval(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0xc420945c00, 0x0, 0x0)
    eval.go:1067 +0x14e7
github.com/grailbio/reflow.(*Eval).Do.func4(0xc4202b7200, 0xc42074e250, 0xc420945c00)
    eval.go:459 +0x4f
created by github.com/grailbio/reflow.(*Eval).Do
    eval.go:458 +0x4d5

goroutine 31956 [select, 1 minutes]:
github.com/grailbio/reflow/pool.Keepalive(0xff7d60, 0xc421b18280, 0xc420cb6cf0, 0xfff2c0, 0xc42b02c480, 0xc4204b6f78, 0x42b9c9)
    pool/pool.go:148 +0x2c7
github.com/grailbio/reflow/runner.(*Stealer).Go.func1(0xff7d60, 0xc421b18280, 0xc420a237e0, 0xfff2c0, 0xc42b02c480, 0xc424080490)
    runner/stealer.go:60 +0x78
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:59 +0x460

goroutine 50992 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc6c9e58, 0x72, 0xc4217df760)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420fa3a18, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420fa3a18, 0xc420f3cc00, 0x400, 0x400)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420fa3a00, 0xc420f3cc00, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc420fa3a00, 0xc420f3cc00, 0x400, 0x400, 0x2, 0xc4209ebe10, 0x1)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc423540258, 0xc420f3cc00, 0x400, 0x400, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc4217b6690, 0x7f3db80bb000, 0xc423540258, 0x5, 0xc423540258, 0x2)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc42306a000, 0xf52717, 0xc42306a120, 0x458500)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc42306a000, 0xc42ae6a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
net/http.(*persistConn).Read(0xc42262ed80, 0xc42ae6a000, 0x1000, 0x1000, 0xc4217dfb98, 0x405585, 0xc4207705a0)
    GOROOT/src/net/http/transport.go:1453 +0x136
bufio.(*Reader).fill(0xc42277ca80)
    GOROOT/src/bufio/bufio.go:100 +0x11e
bufio.(*Reader).Peek(0xc42277ca80, 0x1, 0x0, 0x0, 0x0, 0xc42ab80060, 0x0)
    GOROOT/src/bufio/bufio.go:132 +0x3a
net/http.(*persistConn).readLoop(0xc42262ed80)
    GOROOT/src/net/http/transport.go:1601 +0x185
created by net/http.(*Transport).dialConn
    GOROOT/src/net/http/transport.go:1237 +0x95a

goroutine 34426 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc6a4900, 0x72, 0xc4217e0858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4239f0d18, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc4239f0d18, 0xc4223fa000, 0x2000, 0x2000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4239f0d00, 0xc4223fa000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc4239f0d00, 0xc4223fa000, 0x2000, 0x2000, 0x20, 0x1, 0x7f3dbc62a640)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc423ab0660, 0xc4223fa000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc4222340f0, 0x7f3db80bb000, 0xc423ab0660, 0x5, 0xc423ab0660, 0x0)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc423d96700, 0xf52717, 0xc423d96820, 0xc420cdb320)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc423d96700, 0xc4237ec000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc420df2780, 0xc42194a498, 0x9, 0x9, 0xc4217e0c50, 0x470183, 0xc4217e0c60)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc420df2780, 0xc42194a498, 0x9, 0x9, 0x9, 0xc420b4ca01, 0xc4217e0d00, 0xb1e57b)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc420df2780, 0xc42194a498, 0x9, 0x9, 0xc420064060, 0xc4217e0d30, 0xb1e48d)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc42194a498, 0x9, 0x9, 0xfeff60, 0xc420df2780, 0x0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc42194a460, 0xc42422d5c0, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc4217e0fb8, 0xf51a90, 0xc4228d37b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc42107afc0)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 52454 [select]:
context.propagateCancel.func1(0xff8260, 0xc421a50100, 0xff2760, 0xc4216903c0)
    GOROOT/src/context/context.go:260 +0xff
created by context.propagateCancel
    GOROOT/src/context/context.go:259 +0x1a7

goroutine 38082 [select]:
github.com/grailbio/reflow/runner.(*worker).Go(0xc421efdc50, 0xff7e20, 0xc420851650)
    runner/worker.go:66 +0x2dd
github.com/grailbio/reflow/runner.(*Stealer).Go.func2(0xff7d60, 0xc423df4a80, 0xc421efdc50, 0xc420a237e0, 0xc4230e3b30, 0xfff2c0, 0xc42228d220)
    runner/stealer.go:69 +0xa7
created by github.com/grailbio/reflow/runner.(*Stealer).Go
    runner/stealer.go:66 +0x4ca

goroutine 29935 [IO wait]:
internal/poll.runtime_pollWait(0x7f3dbc733680, 0x72, 0xc423846858)
    GOROOT/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc423e23218, 0x72, 0xffffffffffffff00, 0xff1f20, 0x1542c90)
    GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc423e23218, 0xc422568000, 0x2000, 0x2000)
    GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc423e23200, 0xc422568000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc423e23200, 0xc422568000, 0x2000, 0x2000, 0xc423846a38, 0xb23792, 0xe80720)
    GOROOT/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420ef8570, 0xc422568000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    GOROOT/src/net/net.go:176 +0x6a
crypto/tls.(*block).readFromUntil(0xc423759f80, 0x7f3db80bb000, 0xc420ef8570, 0x5, 0xc420ef8570, 0x100000000f1a8de)
    GOROOT/src/crypto/tls/conn.go:493 +0x96
crypto/tls.(*Conn).readRecord(0xc422375500, 0xf52717, 0xc422375620, 0x4)
    GOROOT/src/crypto/tls/conn.go:595 +0xe0
crypto/tls.(*Conn).Read(0xc422375500, 0xc420baa000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    GOROOT/src/crypto/tls/conn.go:1156 +0x100
bufio.(*Reader).Read(0xc4225243c0, 0xc420de82d8, 0x9, 0x9, 0x2, 0xc423846c78, 0x405d75)
    GOROOT/src/bufio/bufio.go:216 +0x238
io.ReadAtLeast(0xfeff60, 0xc4225243c0, 0xc420de82d8, 0x9, 0x9, 0x9, 0x413c87, 0xc423846d08, 0xc423846d08)
    GOROOT/src/io/io.go:309 +0x86
io.ReadFull(0xfeff60, 0xc4225243c0, 0xc420de82d8, 0x9, 0x9, 0x4701e3, 0xc423846d18, 0xc423846d48)
    GOROOT/src/io/io.go:327 +0x58
golang.org/x/net/http2.readFrameHeader(0xc420de82d8, 0x9, 0x9, 0xfeff60, 0xc4225243c0, 0x0, 0x100000000, 0xc422975e00, 0xc423846e00)
    external/org_golang_x_net/http2/frame.go:237 +0x7b
golang.org/x/net/http2.(*Framer).ReadFrame(0xc420de82a0, 0xc42b1fa450, 0x0, 0x0, 0x0)
    external/org_golang_x_net/http2/frame.go:492 +0xa4
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc423846fb8, 0xf51a90, 0xc4220b57b8)
    external/org_golang_x_net/http2/transport.go:1477 +0x8e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc420dc4000)
    external/org_golang_x_net/http2/transport.go:1405 +0x68
created by golang.org/x/net/http2.(*Transport).newClientConn
    external/org_golang_x_net/http2/transport.go:592 +0x65c

goroutine 52571 [runnable]:
github.com/grailbio/reflow/syntax.Force.func4()
    syntax/force.go:81 +0x185
github.com/grailbio/reflow/syntax.(*resolver).Resolve.func1(0xc423a06000, 0x120, 0x120, 0xdae440)
    syntax/force.go:197 +0xc6
github.com/grailbio/reflow.(*Flow).canonicalize.func2(0xc423a06000, 0x120, 0x120, 0xc423a06000)
    flow.go:839 +0x57
github.com/grailbio/reflow.(*Eval).eval(0xc4202b7200, 0xff7d60, 0xc420bf6780, 0xc420fcaa80, 0x0, 0x0)
    eval.go:1067 +0x14e7
github.com/grailbio/reflow.(*Eval).Do.func4(0xc4202b7200, 0xc42074e250, 0xc420fcaa80)
    eval.go:459 +0x4f
created by github.com/grailbio/reflow.(*Eval).Do
    eval.go:458 +0x4d5

I don't mind sharing the context at all, it is for an article after all, but I don't think it is relevant without the whole code. In terms of the mechanism, the problematic part is a map-reduce with around 400 items: I do the same operation around 400 times in parallel and collect that into a list that I turn into a directory for reduction. Basically, the problematic part is in the spirit of (and I've tried to explicit all the actual types used):

val dirs = make("$/dirs")

func job(pop_file file, pop_name string, spec file) file = exec(...)  # 1GiB, 1 CPU
func reduce(pop, comb dir) file = exec(...)  # 5GiB, 1 CPU

val populations [string, file] = ...  // size: 4, around 600 MiB
val populations__dir dir = dirs.Make(populations)
val specs dir = ...  // size: around 100 files, 100 KiB

val Main file = {
  val comb [string, file] = [(pop_name + "/" + spec_id, job(pop_file, pop_name, spec_file)) |
    (pop_name, pop_file) <- populations,
    (spec_id, spec_file) <- map(specs)]

  reduce(populations__dir, dirs.Make(map(comb)))
}

Thank you again for your time.

mariusae commented 6 years ago

Thanks @hchauvin for all the details! I think I've been able to induce a condition that results in this, when the evaluator has a lot of CPUs. Working on it now:

/*
    val dirs = make("$/dirs")

    func job(pop_file file, pop_name string, spec file) file = exec(...)  # 1GiB, 1 CPU
    func reduce(pop, comb dir) file = exec(...)  # 5GiB, 1 CPU

    val populations [string, file] = ...  // size: 4, around 600 MiB
    val populations__dir dir = dirs.Make(populations)
    val specs dir = ...  // size: around 100 files, 100 KiB

    val Main file = {
      val comb [string, file] = [(pop_name + "/" + spec_id, job(pop_file, pop_name, spec_file)) |
        (pop_name, pop_file) <- populations,
        (spec_id, spec_file) <- map(specs)]

      reduce(populations__dir, dirs.Make(map(comb)))
    }
*/

val strings = make("$/strings")

val N = 1000
func makeSpecs(n int) = map([delay((key, strings.FromInt(key))) | key <- range(0, n)])
val specs = makeSpecs(N)

@requires(cpu := 1)
val Main = [makeSpecs(400) | (key, value) <- delay(specs)]
mariusae commented 6 years ago

I believe I understand the issue now, and the fix is easy.

The problem is when there's multiple levels of continuations (OpK), and the outer K resolves to the same inner K, then the inner K can be called multiple times. This is usually not a problem, but in the case of maps, it tries to re-resolve them, which causes concurrent reads and writes.

We could attempt to de-dup the call to the inner K, but this doesn't save much in reality and would introduce some additional complexity. Instead, we'll fix the implementation of the map force continuation to do the right thing.

Fix should be coming shortly!

Thanks so much for your help.

mariusae commented 6 years ago

fyi, the fix is really simple:

diff --git a/go/src/github.com/grailbio/reflow/syntax/force.go b/go/src/github.com/grailbio/reflow/syntax/force.go
index fcf1f609f3..249edb010c 100644
--- a/go/src/github.com/grailbio/reflow/syntax/force.go
+++ b/go/src/github.com/grailbio/reflow/syntax/force.go
@@ -7,6 +7,7 @@ package syntax
 import (
    "log"
    "sort"
+   "sync"

    "github.com/grailbio/base/digest"
    "github.com/grailbio/reflow"
@@ -184,18 +185,26 @@ func (r *resolver) Resolve(proc func()) values.T {
        }
        return r.v
    }
+   var once sync.Once
    writeN(r.dw, int(r.t.Kind))
    return &reflow.Flow{
        Op:         reflow.OpK,
        Deps:       r.deps,
        FlowDigest: r.dw.Digest(),
        K: func(vs []values.T) *reflow.Flow {
-           for i := range vs {
-               *r.vps[i] = vs[i]
-           }
-           if proc != nil {
-               proc()
-           }
+           // Initialize the underlying datastructure only once.
+           // This makes it safe to invoke the K multiple times.
+           // Since the underlying computation is deterministic,
+           // we are free to ignore the vs from subsequent
+           // computations.
+           once.Do(func() {
+               for i := range vs {
+                   *r.vps[i] = vs[i]
+               }
+               if proc != nil {
+                   proc()
+               }
+           })
            return flow(r.v, r.t)
        },
    }

It's currently pending code review at GRAIL; will show up here in short order.

mariusae commented 6 years ago

Fixed by: https://github.com/grailbio/reflow/commit/3a1802c5edc891b2fb364f979392c26f694761d3

hchauvin commented 6 years ago

Amazing! I can confirm it now works with the conditions listed above.

mariusae commented 6 years ago

Thanks for confirming. I'll probably make a maintenance release today.