grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

New ami after CoreOS deprecation? #129

Closed dimenwarper closed 3 years ago

dimenwarper commented 3 years ago

I'm suddenly getting launch error: InvalidAMIID.NotFound: errors and I though the ami was deprecated so I tried updating to a new CoreOS ami, but to my surprise stable.release.core-os.net is now down. Looks like CoreOS no longer exists and is now Fedora CoreOS as RedHat bought them. What's the new ami that the reflow team is using now?

jcharum commented 3 years ago

Flatcar Container Linux, ami-0bb54692374ac10a7, should be a drop-in replacement. This will be updated when we sync the repository again, but that should at least unblock you.

dimenwarper commented 3 years ago

Awesome, thanks for the quick reply! That ami is indeed available, but now my jobs are stuck at

waiting for reflowlet to become available
jcharum commented 3 years ago

I'm not sure. Let me see if anyone more familiar with Reflow around here knows.

swami-m commented 3 years ago

Can you run the command with -log debug and share the logs?

dimenwarper commented 3 years ago

Thanks for the response! There is nothing in the logs, even when I run with -log debug, it's just stuck in waiting for reflowlet to become available and then it timeouts on provisioning the instance (?) with offers ec2-54-201-6-159.us-west-2.compute.amazonaws.com:9000: timeout: context deadline exceeded

swami-m commented 3 years ago

Ok, can you please answer the following questions ?

dimenwarper commented 3 years ago

I'm running the 0.6.3 release -- 0.6.7 onward forces me to migrate to CoreOS before running anything due to ValidateConfig. Unfortunately I can't see neither evaluating with configuration or installing reflowlet image when I run with this version.

output of reflow config -marshal

aws: awsenv                                                                                                                              
awsenv:                                                                                                                                  
  credentials:                                                                                                                           
    accesskeyid: ****                                                                                                    
    secretaccesskey: ****                                                                            
    sessiontoken: ""                                                                                                                     
    providername: EnvConfigCredentials                                                                                                   
  region: us-west-2                                                                                                                      
awstool: docker,grailbio/awstool:latest                                                                                                  
cache: "off"                                                                                                                             
cluster: ec2cluster                                                                                                                      
ec2cluster:                                                                                                                              
  securitygroup: sg-c78f77bb                                                                                                             
  region: us-west-2                                                                                                                      
  disktype: gp2                                                                                                                          
  diskspace: 500                                                                                                                         
  diskslices: 0                                                                                                                          
  ami: ami-0bb54692374ac10a7                                      
  maxinstances: 50                                                
  instancetypes:                                                                                                                         
  - m4.4xlarge                                                    
  - i3.4xlarge                                                                                                                           
  - m4.16xlarge                                                                                                                          
  - i3.8xlarge                                                                                                                           
  - c1.medium                                                                                                                            
  - c1.xlarge                                                                                                                            
  - c3.2xlarge
  - c3.4xlarge
...
sshkey: |                                                                                                                              
    ssh-rsa *** dimenwarper@gmail.com
  keyname: ""
  cloudconfig: {}
https: httpsca,/home/pablo/.reflow/reflow.pem
httpsca: |
-----BEGIN CERTIFICATE----- 
*****
-----END AUTHORITY CERTIFICATE-----
reflowlet: grailbio/reflowlet:1520533605
repository: s3,reflow-pablo-s3-cache
dimenwarper commented 3 years ago

I'm trying to get a development version up and running to see if it works with the new Flatcar AMI. After go getting and installing via buildreflow (this only works in go version 1.13.1 fyi, fails for more recent versions of go), I try my reflow run command and get:

no provider for type taskdb.TaskDB (/home/pablo/go/src/github.com/grailbio/reflow/tool/runner.go:146)

reflow config -marshal gives (ommiting keys and stuff):

assoc: dynamodbassoc,table=reflow-pablo-dynamodb
awscreds: awscreds
awstool: awstool,awstool=grailbio/awstool:latest
bootstrap: bootstrapimage,uri=bootstrap
cache: readwrite
cluster: ec2cluster
docker: docker,memlimit=soft
ec2cluster:
  securitygroup: sg-c78f77bb
  region: us-west-2
  disktype: gp2
  diskspace: 500
  diskslices: 0
  ami: ami-0bb54692374ac10a7
https: httpsca,/home/pablo/.reflow/reflow.pem
key: {}
labels: kv
logger: logger,level=info
reflowlet: reflowletconfig
repository: s3,bucket=reflow-pablo-s3-cache
session: awssession
sshkey: key
tls: tls,file=/tmp/ca.reflow
tracer: xray
user: user
versions: {}
prb2 commented 3 years ago

Hi, I've attached a small diff that you can apply to fix the no provider for type taskdb.TaskDB issue. I was able to start a run with a fresh checkout and this patch.

cd $CHECKOUT
wget https://github.com/grailbio/reflow/files/5490833/taskdb_fix.txt
git apply taskdb_fix.txt
cd cmd/reflow
buildreflow

taskdb_fix.txt

dimenwarper commented 3 years ago

This works with the new AMI! Thanks so much!

swami-m commented 3 years ago

And just FYI, we realize that it has been a while since we've updated the repository. We'll try to do that soon and perhaps also make a new release.

prb2 commented 3 years ago

The repo has been updated and I've just released reflow1.3.1. The release fixes the problems discussed in this issue so I'm closing it.

dimenwarper commented 3 years ago

Awesome work, thanks!

On Fri, Nov 6, 2020 at 4:16 PM Prudhvi Boyapalli notifications@github.com wrote:

Closed #129 https://github.com/grailbio/reflow/issues/129.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grailbio/reflow/issues/129#event-3969338259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL7MWG6YC7FKLGO3UYUH3SOSGW3ANCNFSM4TFSINHA .