Redis instances shut down when scheduler restarted

eastlondoner commented 7 years ago

We have built mr-redis from the latest master and are running it on DC/OS (using zookeeper rather than etcd).

The basics work ok but when the scheduler is restarted the existing redis instances shut down and don't come back.

If you call the /STATUS endpoint it says that the redis instances are up - but looking in mesos they're not running any more

eastlondoner commented 7 years ago

It looks to me like the failover_timeout logic is not quite right in mesoslib.go

see here: http://mesos.apache.org/documentation/latest/high-availability-framework-guide/

recommended settings are much greater than the 60 seconds that is set

I think the logic of using the failover timeout in GetFrameworkID is not correct: e.g. if my scheduler has been up for longer than failover timeout and then restarts it shouldn't loose the old framework id (and all the running tasks).

eastlondoner commented 7 years ago

See this PR which fixes the behaviour when a scheduler is restarted: https://github.com/mesos/mr-redis/pull/57

dhilipkumars commented 7 years ago

the PR looks good to me.

dhilipkumars commented 7 years ago

First of all thanks a lot for the contribution. Glad to hear that you are using mr-redis. I think mr-redis needs the leader-follower logic to be implemented so that more than one instance of this scheduler can be run at once for high-availability. Would you like to contribute that functionality?

dhilipkumars commented 7 years ago

@eastlondoner How are you running it with DC/OS?
if you have re-packaged it would you be interested in contributing it to universe as version 01.

eastlondoner commented 7 years ago

Hi @dhilipkumars We're running it by installing the package from universe then going into Marathon and changing the docker image to point at out docker image: https://hub.docker.com/r/tractableio/mr-redis/

eastlondoner commented 7 years ago

We also had to change the docker client API version setting in mr-redis to match the version of Docker running on our Agents before we built that docker image. You can see the code change on my fork. I've not issued a PR because I think there is a better way of doing it where it determines the docker api version from DOCKER_HOST env variable - but I've not had time to look into it.

eastlondoner commented 7 years ago

I guess I could push a new version to the universe, but I wouldn't want to push something that includes code changes that aren't in this (mainline) repo. Furthermore for the latest DC/OS I think that the docker API should be 1.25!

eastlondoner commented 7 years ago

n.b. this is the commit I am concerned about: https://github.com/eastlondoner/mr-redis/commit/10bdba0c36e5f007fe8e0f9a5b9600ad46c4fb93

daguero commented 6 years ago

@eastlondoner Hello, I'm trying to access the image of docker https://hub.docker.com/r/tractableio/mr-redis/ but it is not accessible, could you give me some other option ???

Thank you

daguero commented 6 years ago

Hi @dhilipkumars I have the same problem that is discussed in this issue, I would like to access the docker image https://hub.docker.com/r/tractableio/mr-redis/ to do some tests.

Thank you

eastlondoner commented 6 years ago

@daguero I don't work at Tractable anymore and I recall I did some hacky things that I didn't want to publish to make it work. However you should be able to build your own docker image that will work if you use my fork: https://github.com/eastlondoner/mr-redis

daguero commented 6 years ago

@eastlondoner OK, Thanks for your help, I'll prove it

mesos / mr-redis

Redis instances shut down when scheduler restarted #56