grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

Reflow 1.8.0: bootstrap image: 403 Forbidden #139

Closed niemasd closed 3 years ago

niemasd commented 3 years ago

I just installed Reflow 1.8.0 on my Ubuntu environment using the Linux binary provided in my release. When I try to run a Reflow run file, however, I get the following error infinitely repeating:

reflow: scheduler: failed to allocate {mem:1.0GiB cpu:1 disk:1.0GiB}#2 from cluster: bootstrap image: https://grail-public-bin.s3-us-west-2.amazonaws.com/linux/amd64/reflowbootstrap0.4: fatal:
        bootstrap image: HEAD https://grail-public-bin.s3-us-west-2.amazonaws.com/linux/amd64/reflowbootstrap0.4: 403 Forbidden

In the same environment, I also have Reflow 1.6.0, and that one is able to allocate EC2 instances fine using the exact same ~/.reflow/config.yaml file:

$ reflow1.8.0 run test.rf # this gets the repeated error
$ reflow1.6.0 run test.rf # this works fine

Both executables are installed in /usr/local/bin

prb2 commented 3 years ago

It looks like the permissions for the new reflowbootstrap0.4 object are not configured correctly. Please use a previous version of reflow until I can fix the permissions. Will update here once it is fixed.

niemasd commented 3 years ago

Sure, thanks so much!

prb2 commented 3 years ago

It should be fixed now, feel free to reopen the issue if you're still seeing the issue.

niemasd commented 3 years ago

Yes, it seems to be fixed now; thank you!

niemasd commented 3 years ago

Wait, actually, now, it failed again, now with this error:

reflow: ec2cluster: [i-0e11485563696d69f]: instance i-0e11485563696d69f: fatal: bootstrap execimage POST {s3://niema-test-quickstart-cache/objects/sha256:310c1a4e849d53eb3f25cb52558549c84b5fbbe8656d570a64aaa72cc618ca85 reflowlet [-config /etc/reflowconfig serve -ec2cluster]}: s3blob.Get niema-test-quickstart-cache objects/sha256:310c1a4e849d53eb3f25cb52558549c84b5fbbe8656d570a64aaa72cc618ca85: temporary: EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
        status code: 404, request id:
caused by: <?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
                 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
swami-m commented 3 years ago

failed to make EC2Metadata request

This is coming from AWS SDK. Perhaps https://github.com/aws/amazon-ecs-agent/issues/2329 might be of interest ?

Alternately, try with the following in your reflow config instead.

session: awssession