Open eastlondoner opened 7 years ago
It looks to me like the failover_timeout logic is not quite right in mesoslib.go
see here: http://mesos.apache.org/documentation/latest/high-availability-framework-guide/
I think the logic of using the failover timeout in GetFrameworkID is not correct: e.g. if my scheduler has been up for longer than failover timeout and then restarts it shouldn't loose the old framework id (and all the running tasks).
See this PR which fixes the behaviour when a scheduler is restarted: https://github.com/mesos/mr-redis/pull/57
the PR looks good to me.
First of all thanks a lot for the contribution. Glad to hear that you are using mr-redis. I think mr-redis needs the leader-follower logic to be implemented so that more than one instance of this scheduler can be run at once for high-availability. Would you like to contribute that functionality?
@eastlondoner
How are you running it with DC/OS?
if you have re-packaged it would you be interested in contributing it to universe as version 01.
Hi @dhilipkumars We're running it by installing the package from universe then going into Marathon and changing the docker image to point at out docker image: https://hub.docker.com/r/tractableio/mr-redis/
We also had to change the docker client API version setting in mr-redis to match the version of Docker running on our Agents before we built that docker image. You can see the code change on my fork. I've not issued a PR because I think there is a better way of doing it where it determines the docker api version from DOCKER_HOST env variable - but I've not had time to look into it.
I guess I could push a new version to the universe, but I wouldn't want to push something that includes code changes that aren't in this (mainline) repo. Furthermore for the latest DC/OS I think that the docker API should be 1.25!
n.b. this is the commit I am concerned about: https://github.com/eastlondoner/mr-redis/commit/10bdba0c36e5f007fe8e0f9a5b9600ad46c4fb93
@eastlondoner Hello, I'm trying to access the image of docker https://hub.docker.com/r/tractableio/mr-redis/ but it is not accessible, could you give me some other option ???
Thank you
Hi @dhilipkumars I have the same problem that is discussed in this issue, I would like to access the docker image https://hub.docker.com/r/tractableio/mr-redis/ to do some tests.
Thank you
@daguero I don't work at Tractable anymore and I recall I did some hacky things that I didn't want to publish to make it work. However you should be able to build your own docker image that will work if you use my fork: https://github.com/eastlondoner/mr-redis
@eastlondoner OK, Thanks for your help, I'll prove it
We have built mr-redis from the latest master and are running it on DC/OS (using zookeeper rather than etcd).
The basics work ok but when the scheduler is restarted the existing redis instances shut down and don't come back.
If you call the /STATUS endpoint it says that the redis instances are up - but looking in mesos they're not running any more