stitchfix / flotilla-os

Open source Flotilla
Apache License 2.0
192 stars 10 forks source link

flotilla Docker image doesn't work #74

Open raybuhr opened 6 years ago

raybuhr commented 6 years ago

This project looks really cool and useful for me, but I'm having a really hard time getting it to work. For reference, I'm running ubuntu 16.04, go 1.9, docker 17.12.0, docker-compose version 1.19.0.

I started by just launching via docker-compose up -d but the image for flotillaos_flotilla would exit within a few seconds after launching (the ui and postgres images seem to be working fine). I tried rebuilding from scratch and while it doesn't error out, running the image does nothing. I then just started from the golang:latest image and proceeded through the RUN commands in the dockerfile. It fails on the go install github.com/stitchfix/flotilla-os command:

root@e7a3b117e00a:/go# go install github.com/stitchfix/flotilla-os
src/github.com/stitchfix/flotilla-os/clients/registry/registry.go:17:2: code in directory /go/src/github.com/moby/moby/registry expects import "github.com/docker/docker/registry"
root@e7a3b117e00a:/go#

Running go test from that directory results in the same failure message.

I'm not super familiar with govendor, but I tried switching this to dep with some progress. I ran curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh to install dep. Then from this repo home, I ran dep init to migrate the vendor/vendor.json file to Gopkg.lock and Gpkg.toml files that dep uses. Finally I ran dep ensure to sync dependencies. This ran successfully and I was able to install flotilla with go get github.com/stitchfix/flotilla-os. However, that still didn't work with the docker-compose setup as the flottila api crashes after roughly a minute.

I then tried rebuilding and installing the flotilla app locally from the source code using dep. That seems like it works I'm able to get the api server to run, but hitting http://localhost:3000/api/v1/tasks returns 404. The api server is running, and will spit out messages after creating tasks/runs in the UI, but I don't think I understand how this attaches to my AWS config (I'm used to using ~/.aws/config and ~/.aws/credentials) and I don't see any options to specify an account. Here's the stdout from the running api server even though I don't think it's helpful:

➜  flotilla-os git:(master) ✗ flotilla-os conf
message="Initializing logs client" client=awslogs
message="Starting worker" name=retry
message="Starting worker" name=submit
message="Starting worker" name=status
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
message="Got 0 jobs to retry"
alienrobotwizard commented 6 years ago

Hi @raybuhr I suspect the issue was with AWS credentials not getting pulled into the environment of the service when it started with docker-compose. You can confirm this when you run by trying:

docker logs -f flotillaos_flotilla_1 after the docker-compose up -d completes.

I've pushed a fix that accepts AWS keys from the environment. Can you try running:

export AWS_ACCESS_KEY_ID=$(aws --profile default configure get aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(aws --profile default configure get aws_secret_access_key)
docker-compose up -d

Navigating to localhost:5000 after the above works will bring you to the ui where you can create and launch tasks.

Note I've updated the readme with some additional information regarding the exact AWS permissions you need: https://github.com/stitchfix/flotilla-os#minimal-assumptions

In terms of running locally, not with docker, you shouldn't need to specify credentials as flotilla uses AWS's go-sdk which will look for credentials in the default places like that (~/.aws). Also the endpoint for tasks is non-plural, /api/v1/task as opposed to /api/v1/tasks.

In any case, please let me know if this resolves the docker issues for you

raybuhr commented 6 years ago

Thanks for looking into this. It does look like the problems were with the AWS credentials. I went through the ECS cluster setup again from scratch on a brand new AWS account. The docker logs at first looked like it was still not finding my credentials even after making them explicit env vars, but after re-running the aws configure command flotilla was able to pick up my new credentials and respond successfully with a curl http://localhost:3000/api/v1/task.

Unfortunately, I am unable to get new tasks created via the web UI to actually save or run. I'm not getting any error messages. The web UI says it saved the task successfully and takes me to the "Runs" tab, but doesn't load anything. The url of the created task doesn't seem like it was actually successful -- http://localhost:5000/#/tasks/undefined/run?cluster=default

Hitting the api/v1/task endpoint afterward shows no task definitions. I tried creating the example hello-world task via the API and got this back:

Warning: an empty POST.
{"error":"EOF"}

I tried creating the hello-world.json from the example and posting that, but I get back:

{"error":"invalid character 'h' looking for beginning of value"}