nextml / NEXT

NEXT is a machine learning system that runs in the cloud and makes it easy to develop, evaluate, and apply active learning in the real-world. Ask better questions. Get better results. Faster. Automated.
http://nextml.org
Apache License 2.0
160 stars 53 forks source link

Update setup.sh #224

Open Ahtidevin opened 4 years ago

Ahtidevin commented 4 years ago

Updated setup.sh since the source from which docker was being installed expired.

stsievert commented 4 years ago

Thanks for the PR @Ahtidevin! Does this work when it's run, or are there still issues?

Ahtidevin commented 4 years ago

It solved few of the problems but minion worker still does not seem to start.

Ahtidevin commented 4 years ago

Hi Scott,

Thanks again for looking into the issue. It would be really great if you could suggest changes in the script as I am unsure why the minion worker is not starting up. I really appreciate the help.

Thank you, Niveditha


From: Scott Sievert notifications@github.com Sent: Monday, April 13, 2020 5:07 PM To: nextml/NEXT NEXT@noreply.github.com Cc: NIVEDITHA HARIHARAN nhariharan@wisc.edu; Mention mention@noreply.github.com Subject: Re: [nextml/NEXT] Update setup.sh (#224)

Thanks for the PR @Ahtidevinhttps://github.com/Ahtidevin! Does this work when it's run, or are there still issues?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nextml/NEXT/pull/224#issuecomment-613121309, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALGG2NPSPOM5GYTWP63ICOTRMOELPANCNFSM4MHG44GQ.

stsievert commented 4 years ago

Thanks for the ping. I've launched the current master branch locally, and it launches just fine. That means the error is EC2-specific.

I'm not the biggest fan of the ec2/ directory. It's definitely outdated. If you wanted, you could launch the NEXT AMI yourself, rsync your files up and re-launch the docker machines (though you'd have to modify ami/next.sh).


If you want to resolve the issue with EC2, let's look at the an error message from your error logs:

minionworker_1 | [2020-04-13 03:25:01,493: ERROR/MainProcess] consumer: 
Cannot connect to amqp://guest:**@127.0.0.1:5672//: [Errno 111] Connection refused.

Again, this is EC2 specific, not NEXT.

  1. Look at this SO answer: https://stackoverflow.com/questions/50222808/celery-not-work-cannot-connect-to-amqp-guest127-0-0-15672 CELERY_BROKER_URL might need to be configured in the minionworker environment variables for docker-compose.yml.
  2. The relevant URL is configured in https://github.com/nextml/NEXT/blob/7f604f9f2ca8d1bcbd72a371d7bb8e77ba001a99/next/constants.py#L66-L71

AMQP is a message passing protocol. At first glance, this diff might solve it:

- BROKER_URL = 'amqp://{user}:{password}@{hostname}:{port}/{vhost}/'.format( 
-     vhost=os.environ.get('RABBIT_ENV_VHOST', '')) 
+ BROKER_URL = 'amqp://{user}:{password}@{hostname}:{port}/{vhost}'.format( 
+     vhost=os.environ.get('RABBIT_ENV_VHOST', '/')) 

But I'm not seeing any modification of any RABBIT_* environment variables in ec2/...

Ahtidevin commented 4 years ago

Hi Scott,

I really appreciate your update. I am trying to understand the changes suggested by you. I will update once the ec2 script works successfully.

Thank you, Niveditha


From: Scott Sievert notifications@github.com Sent: Friday, April 17, 2020 12:20 PM To: nextml/NEXT NEXT@noreply.github.com Cc: NIVEDITHA HARIHARAN nhariharan@wisc.edu; Mention mention@noreply.github.com Subject: Re: [nextml/NEXT] Update setup.sh (#224)

Thanks for the ping. I've launched the current master branch locally, and it launches just fine. That means the error is EC2-specific.

I'm not the biggest fan of the ec2/ directory. It's definitely outdated. If you wanted, you could launch the NEXT AMI yourself, rsync your files up and re-launch the docker machines (though you'd have to modify ami/next.shhttps://github.com/nextml/NEXT/blob/master/ami/next.sh#L15).


If you want to resolve the issue with EC2, let's look at the an error message from your error logs:

minionworker_1 | [2020-04-13 03:25:01,493: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@127.0.0.1:5672//: [Errno 111] Connection refused.

Again, this is EC2 specific, not NEXT.

  1. Look at this SO answer: https://stackoverflow.com/questions/50222808/celery-not-work-cannot-connect-to-amqp-guest127-0-0-15672 CELERY_BROKER_URL might need to be configured in the minionworker environment variables for docker-compose.ymlhttps://github.com/nextml/NEXT/blob/a9b3edf223841f7a8d00bc7a138d838690673d76/ec2/templates/docker-compose.yml#L38.
  2. The relevant URL is configured in https://github.com/nextml/NEXT/blob/7f604f9f2ca8d1bcbd72a371d7bb8e77ba001a99/next/constants.py#L66-L71

AMQPhttps://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol is a message passing protocol. At first glance, this diff might solve it:

But I'm not seeing any modification of any RABBIT_* environment variables in ec2/...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nextml/NEXT/pull/224#issuecomment-615366757, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALGG2NJZQNNSOFOWNJ7V6BDRNCFUDANCNFSM4MHG44GQ.

stsievert commented 4 years ago

Sounds good. Keep in mind when I say "EC2-specific" I mean "NEXT launches fine on my local machine. It seems there's a modification to some of the EC2 integration NEXT has in ec2/ going wrong".

Ahtidevin commented 4 years ago

Hi Scott,

Oh, I see. I get it now. I will check to see if I can fix this or help Prof Dewey set this up locally. Thanks for your input.

Thank you, Niveditha


From: Scott Sievert notifications@github.com Sent: Friday, April 17, 2020 4:24 PM To: nextml/NEXT NEXT@noreply.github.com Cc: NIVEDITHA HARIHARAN nhariharan@wisc.edu; Mention mention@noreply.github.com Subject: Re: [nextml/NEXT] Update setup.sh (#224)

Sounds good. Keep in mind when I say "EC2-specific" I mean "NEXT launches fine on my local machine. It seems there's a modification to some of the EC2 integration NEXT has in ec2/ going wrong".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nextml/NEXT/pull/224#issuecomment-615470262, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALGG2NLFA6IWN7W2JGXXK63RNDCIJANCNFSM4MHG44GQ.

Ahtidevin commented 4 years ago

I have a question that I came across while debugging the issue. Where in NEXT are the environment variables being set? The problem now with ec2-NEXT integration is that the port numbers are not being set. I have attached a screenshot of the environment variables in local.

Screenshot 2020-04-20 00 15 37

Thanks

stsievert commented 4 years ago

I think they're specified in https://github.com/nextml/NEXT/blob/7f604f9f2ca8d1bcbd72a371d7bb8e77ba001a99/ec2/templates/docker-compose.yml#L38-L40

The values that are filled are configured in https://github.com/nextml/NEXT/blob/7f604f9f2ca8d1bcbd72a371d7bb8e77ba001a99/ec2/next_ec2.py#L770-L771

Ahtidevin commented 4 years ago

Thanks for your clarification. I am able to see some variables set in these files but I am not able to see variables like the ones highlighted below being set. There seems to be a lot more variables set than what I can understand in code.

Screenshot 2020-04-20 12 11 40
stsievert commented 4 years ago

If any of these environment variables need to be changed, I suspect they can be added in next_ec2.py and docker-compose.yml without issue. This comes after looking at how to change RabbitMQ environment variables and seeing the documentation for rabbitmq-env.conf which says

In order of preference, the startup scripts get their values from the environment, from rabbitmq-env.conf and finally from the built-in default values. For example, for the RABBITMQ_NODENAME setting, RABBITMQ_NODENAME from the environment is checked first. If it is absent or equal to the empty string, then NODENAME from rabbitmq-env.conf is checked. If it is also absent or set equal to the empty string then the default value from the startup script is used.

For example, if I wanted to change MINIONREDIS_1_PORT I'd do the following:

minionredis:
   environment:
      MINIONREDIS_1_PORT: {{MINIONREDIS_1_PORT}}
  image: redis
  ...
Ahtidevin commented 4 years ago

I am unable to see the rabbitmq-env.conf or other conf files. I can see only redis.conf. I have one more question. For example MINIONREDIS_PORT = int(os.environ.get('MINIONREDIS_PORT_6379_TCP_PORT', 6379)) but I cannot see a corresponding place where they set this variable. This is the case for many variables highlighted above. Thanks

stsievert commented 4 years ago

I am unable to see the rabbitmq-env.conf or other conf files. I can see only redis.conf. ... but I cannot see a corresponding place where they set this variable.

My point: I think those environment variables can be changed, as long as they're in the environment before NEXT launches. This can be done by modifiying the environment variable in docker-compose.yml.

For example, let's say I want to change MINIONREDIS_1_PORT. I'd add os.environ.get("MINIONREDIS_1_PORT", "") to https://github.com/nextml/NEXT/blob/a9b3edf223841f7a8d00bc7a138d838690673d76/ec2/next_ec2.py#L770-L771

I'd propagate this down to docker-compose.yml by adding an environment key to https://github.com/nextml/NEXT/blob/a9b3edf223841f7a8d00bc7a138d838690673d76/ec2/templates/docker-compose.yml#L19-L21

That is, I'd append this code:

   environment:
    MINIONREDIS_1_PORT: {{MINIONREDIS_1_PORT}}

Note: all environment variables should be strings.