grailbio / bigslice

A serverless cluster computing system for the Go programming language
https://bigslice.io/
Apache License 2.0
550 stars 35 forks source link

Is there a way to distinguish the master from workers? #38

Open tatatodd opened 4 years ago

tatatodd commented 4 years ago

I have a use-case where I need to run logic on the master before I kick off the bigslice session, which then goes and runs the workers. The specific use-case is that I have a single binary that must capture and dump a state file (in this case Kafka offsets) before the bigslice session, for correctness.

Is there a simple way to distinguish the master from the workers? I've looked through the various bigslice packages, and haven't found anything. Ideally I'd have something like this:

package main

func main() {
    if bigslice.IsMaster() {
        // Do master-only stuff, before running the bigslice session
    }

    fn := bigslice.Func(...)
    exec.Must(...)
}
tatatodd commented 4 years ago

FYI @mariusae @cosnicolaou

tatatodd commented 4 years ago

FYI I've found a hacky workaround, by checking for the BIGMACHINE_MODE envvar, which is set to machine on workers started by bigmachine, and is empty on the master (driver).

This happens to work on my setup, but it's obviously a bit brittle; e.g. there's no reason the bigslice master couldn't also have been spawned by bigmachine. So it'd be nice to have a bigslice-centric robust way to determine whether something is a driver (e.g. perhaps the master registers a name somewhere, and the workers get an envvar with that name in it). But my workaround alleviates my short-term need.

jcharum commented 4 years ago

Checking BIGMACHINE_MODE is how bigmachine determines its behavior internally, so it's guaranteed to work on your setup (at least as currently implemented).

e.g. there's no reason the bigslice master couldn't also have been spawned by bigmachine

Do you mean by starting a bigslice session within an RPC? If so, you'd probably want/need to fork (with BIGMACHINE_MODE unset), as workers run the same binary as the master. Their behavior diverges at the call to start bigmachine.