launch brokers redundantly causes conflicts

mikehinchey commented 7 years ago

When I start a broker, it tries to launch on all offers that match. At most one will succeed because the broker id conflicts. Is it intentional to accept all and launch so many?

The error is:

A broker is already registered on the path /brokers/ids/9. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.

I'm trying to debug and understand why sometimes none of the launches succeed. Is it faster than the zookeeper timeout?

In the logs, I see statusUpdate (from Scheduler.scala) is receiving status like TASK_KILLING and TASK_LOST, then killing the task. Would it be best to check for those status states and skip killing the task? (I've started coding this, would appreciate advice.)

Thanks.

steveniemitz commented 7 years ago

Thanks for the bug report, I've definitely never seen this before and it's not intentional.

What version are you running? Can you attach the logs from the scheduler with debug enabled?

steveniemitz commented 7 years ago

I think I see where its happening, I guess I've just never had mesos send more than one offer at once to the framework. I'll get a fix in soon.

steveniemitz commented 7 years ago

PR #272 should fix this, good catch! I'll merge the PR once I give it a test tomorrow in our test cluster.

mesos / kafka

launch brokers redundantly causes conflicts #271