ohsu-comp-bio / funnel

Funnel is a toolkit for distributed task execution via a simple, standard API.
https://ohsu-comp-bio.github.io/funnel
MIT License
121 stars 32 forks source link

deployment: guide and health check endpoint #496

Open buchanae opened 6 years ago

buchanae commented 6 years ago

Deployments of Funnel sometimes need to go through a few iterations of testing, including:

1) Curl (or funnel task list) some endpoint, get a 500 2) Check the funnel logs, or decipher the http response body 3) possibly repeat a few times

It would be nice to make this simple and documented, so that someone deploying funnel has a surefire approach to deployment. I think there's a couple improvements:

1) Better docs. A more complete guide to deployment. Our docs are sort of spread out. A more in-depth guide per environment (e.g. AWS + ELB + Dynamo + Batch) could be useful

2) A health check endpoint, which can be requested with curl. We can run a few checks and return user friendly logs and hints. Once these checks pass, the user can be confident their server is working.

tom-dyar commented 6 years ago

Thanks for pointing me here from #579 -- It seems my tables are created, but I am still getting an error during some code that has to do with creating tables. Pardon the formatting, this is copied from the Cloudwatch error log:

19:51:03 panic: runtime error: invalid memory address or nil pointer dereference

19:51:03 [signal SIGSEGV: segmentation violation code=0x1 addr=0xb8 pc=0xce7e0c]

19:51:03 goroutine 1 [running]:

19:51:03 github.com/ohsu-comp-bio/funnel/server/dynamodb.(*DynamoDB).tableIsAlive(0xc42008e700, 0x1f37b00, 0xc420290660, 0xc420296090, 0xb, 0xc420288750, 0x16)  19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/server/dynamodb/util.go:202 +0x1ac

19:51:03 github.com/ohsu-comp-bio/funnel/server/dynamodb.(*DynamoDB).waitForTables(0xc42008e700, 0x0, 0x0)  19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/server/dynamodb/util.go:213 +0xc4

19:51:03 github.com/ohsu-comp-bio/funnel/server/dynamodb.(*DynamoDB).createTables(0xc42008e700, 0x7ffca21dde43, 0x6)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/server/dynamodb/util.go:189 +0x1c5c

19:51:03 github.com/ohsu-comp-bio/funnel/server/dynamodb.NewDynamoDB(0x7ffca21dde43, 0x6, 0x0, 0x0, 0x7ffca21dde20, 0x9, 0x0, 0x0, 0x0, 0x0, ...)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/server/dynamodb/new.go:43 +0x3c3

19:51:03 github.com/ohsu-comp-bio/funnel/cmd/worker.NewWorker(0x1f37a80, 0xc420050480, 0xc42004e460, 0x1, 0x1, 0x7ffca21dde05, 0x8, 0x14e97cc, 0x5, 0x14ea0c7, ...)  19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/cmd/worker/run.go:53 +0xbf4  19:51:03 github.com/ohsu-comp-bio/funnel/cmd/worker.Run(0x1f37a80, 0xc420050480, 0xc42004e460, 0x1, 0x1, 0x7ffca21dde05, 0x8, 0x14e97cc, 0x5, 0x14ea0c7, ...)  19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/cmd/worker/run.go:22 +0x82

19:51:03 github.com/ohsu-comp-bio/funnel/cmd/worker.newCommandHooks.func2(0xc4203db8c0, 0xc42027e1e0, 0x0, 0xa, 0x0, 0x0)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/cmd/worker/worker.go:71 +0x2ca

19:51:03 github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra.(*Command).execute(0xc4203db8c0, 0xc42027e640, 0xa, 0xa, 0xc4203db8c0, 0xc42027e640)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra/command.go:746 +0x475

19:51:03 github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1fd3460, 0x1fd4d20, 0xc4204fd200, 0x1fd4420)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra/command.go:831 +0x30e

19:51:03 github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra.(*Command).Execute(0x1fd3460, 0xc4204c9f70, 0x1071f3e)

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/vendor/github.com/spf13/cobra/command.go:784 +0x2b

19:51:03 main.main()

19:51:03 /Users/strucka/go/src/github.com/ohsu-comp-bio/funnel/main.go:11 +0x2d

buchanae commented 6 years ago

This is a bug from ignoring an error check, I think: https://github.com/ohsu-comp-bio/funnel/blob/master/database/dynamodb/util.go#L202

adamstruck commented 6 years ago

should be fixed in #580

tom-dyar commented 6 years ago

OK, has that been merged in, and then do I need to check out the latest and build from source?

Thanks!

adamstruck commented 6 years ago

Yeah that will be quickest. Instructions to build from source can be found here: https://ohsu-comp-bio.github.io/funnel/download/

tom-dyar commented 6 years ago

Still getting the error after I re-made the compute environment, job queue and job-def, with latest code. Is there something I can do to check my DynamoDB tables, if they are incorrectly set up??

Also, I had to modify the job definition because there was a "unknown flag EventWriter" error until I edited out the last part of the command that referenced "--EventWriter"

adamstruck commented 6 years ago

Thanks for pointing out that typo.

The compute environment, and job queue should not be impacted by which version of funnel you are running. For your job definition you should just need to update your image reference (if anything).

You are still getting the same panic error as before?

To debug dynamo I'd suggest the following:

Set up a local environment for your funnel server configured to use "local" compute, dynamodb as its database and try running a simple hello world task (funnel run --sh 'echo hello'). The funnel server should be the process that creates the dynamodb tables so I don't necessarily think #579 is a bug.

tom-dyar commented 6 years ago

I don't know what typo you are referring to, since I had to totally remove the flag and the value. I removed this text from the end of the job definition to get it working: " --EventWriter dynamodb --EventWriter log"

OK, that command ( funnel run --sh 'echo hello' ) runs fine locally, with DynamoDB backend. I also see rows for my failed runs in DynamoDB tables, but only the successful (locally run) task in the "stdout" table.

Thanks so much!!

adamstruck commented 6 years ago

The typo I was referring to was in our code. The generated command should have had --EventWriters rather than --EventWriter.

If your interested in using AWS Batch as a backend you may want to consider creating as custom AMI since the default one provided by amazon only has a few GB of storage available. I suggest either creating an AMI with a large fixed disk attached or one that enables dynamic mounting of the EBS volumes to VMs. I played around with the latter approach in https://github.com/adamstruck/ebsmount/tree/master/resources/funnel.

Let us know if you have any other issues! We can also be reached on gitter (https://gitter.im/ohsu-comp-bio/funnel).