nextml / NEXT

NEXT is a machine learning system that runs in the cloud and makes it easy to develop, evaluate, and apply active learning in the real-world. Ask better questions. Get better results. Faster. Automated.
http://nextml.org
Apache License 2.0
160 stars 53 forks source link

Database failure when restarting machines #185

Closed stsievert closed 7 years ago

stsievert commented 7 years ago

@ayonsn017 and I ran into an issue with the database today.

This is likely related to the DNS being in constants.py. We didn't (and shouldn't have to) set any environment variables to avoid this bug.

@liamim can you look into this?

erinzm commented 7 years ago

my first thought would be that PyMongo's not properly establishing a connection for some reason. i'll attempt to reproduce tonight using master/HEAD, and then try using database_cleanup to get more informative error messages.

This is likely related to the DNS being in constants.py. We didn't (and shouldn't have to) set any environment variables to avoid this bug.

since all the docker containers are linked, and constants.py just points at localhost:27017, what are you referring to?

stsievert commented 7 years ago

i'll attempt to reproduce

We launched this machine via the next_ec2.py script then restarted and encountered this bug while running test_api.py. This test ran fine for a number of responses, then failed after a seemingly random number of responses (sometimes on the 3rd response, sometimes on the 80th response). I'd like to see a test that follows this same procedure.

stsievert commented 7 years ago

@liamim I've run a test (launched via next_ec2.py, restarted) and which failed (query pages wouldn't load, etc). Can you dig into using NEXT after restarting EC2 machines with launch from next_ec2.py?

stsievert commented 7 years ago

Closed with documentation on the wiki (which added a note to use next_ec2.py start)