mesos / mesos-go

Go language bindings for Apache Mesos
Apache License 2.0
545 stars 146 forks source link

"next branch" synchronize to handle event #251

Closed hi-wayne closed 7 years ago

hi-wayne commented 8 years ago

Per my understanding mesos can not guarantee the time sequence of event delivery to framework, next branch https://github.com/mesos/mesos-go/blob/next/extras/scheduler/controller/controller.go#L103 Will the synchronize to handle event lead to decread of throughput? Is it better to change it into goroutine?

jdef commented 8 years ago

AFAIK mesos tries (but doesn't guarantee, due to libprocess reasons) to deliver status updates, for some task, in the order they were processed by the master. There's at least one Mesos JIRA tracking/discussing that issue.

I deliberately chose to NOT use multiple goroutines here for dispatching event handling. This allows a framework scheduler to implement a simpler set of handlers and to worry less about the state of the system changing under it. While it's possible that throughput could be improved with multiple goroutines I strongly suspect that other parts of the cluster would become a bottleneck first. If some framework-specific event handler has to perform any kind of potentially blocking work then it probably makes sense to run that work in a goroutine (perhaps via a job queue) to avoid slowing down the controller/event-loop machinery.

On Mon, Jun 20, 2016 at 10:51 AM, wangwei notifications@github.com wrote:

Per my understanding mesos can not guarantee the time sequence of event delivery to framework, next branch

https://github.com/mesos/mesos-go/blob/next/extras/scheduler/controller/controller.go#L103 Will the synchronize to handle event lead to decread of throughput? Is it better to change it into goroutine?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesos/mesos-go/issues/251, or mute the thread https://github.com/notifications/unsubscribe/ACPVLAs5qViBRcI7QLAuuFo_hin3RVedks5qNqjugaJpZM4I5xzn .

hi-wayne commented 8 years ago

I did a test. In the HandleEvent, after the receipt of offer, a lot of tasks which are docker containers and could automatically exit after <10s running were launched by using goroutine, and block the HandleEvent by using time.sleep for a relarively long time (such as 100 seconds). The tasks which had been excited in this 100s will release the offer, but some offer event was missing when calling framework via HandleEvent again. I might run into trouble when using bounded offer hoarding to design schedule. I should set a --offer_timeout in the master. Is it the long time of sleep in the HandleEvent leading to the missing? Or something else would lead to the missing.

jdef commented 8 years ago

If you could share some code (or pseudo-code) that would really help me understand more of the problem you're running into. If you're sleeping in an event handler then you're basically blocking further communication from mesos during that time; I strongly suggest that you consider an alternative implementation strategy.

athlum commented 8 years ago

I worked on this case and I will try to make it clear.

I wrote a simple handler for Event_OFFERS

scheduler.Event_OFFERS: func(e *scheduler.Event) error {
    go driver.ResourceOffers(e.GetOffers().GetOffers()) // Just decline it.
    if isfirst { //this is a variable with value 'true' to make the handler sleep 30s only for the first event. 
        time.Sleep(time.Second * 30)
        isfirst = false
    }
    return nil
},

If there are two or more offer events come in a short time. Some offers will be probably dropped and will be never declined which will stay on the Outstanding Offers list of the mesos master and waiting for recind. Then we tried on changing the buffer size. We did a little change on NewFrameReader

func NewFrameReader(r io.Reader) framing.Reader {
    // br, ok := r.(*bufio.Reader)
    // if !ok {
    //  br = bufio.NewReader(r)
    // }
    br := bufio.NewReaderSize(r, 1024*1024*1024)
    return &reader{r: br}
}

The bigger buffer we have, the less offers we missed.

I found this issue when the scheduler in high load capacity. Many tasks finished and many tasks launched in millisconds. Offer events missed even though I handle it with goroutines. This issue would cut off the offers controlled by the scheduler more and more in every recind cycle.

I'm not sure why mesos master still send events when my buffer is full, or is there anything that I missed make me feel like this.

hi-wayne commented 8 years ago

@jdef it might be an extreme example (sleep 30) but I have made a test using the above-mentioned method. after the revision of the buffsize in the https://github.com/mesos/mesos-go/blob/next/recordio/reader.go#L22 the loss of event message indeed gets improved is flaky will it block the write operation in the server-side when the event processing get slowdown(eg:sleep n mock...)?

jdef commented 8 years ago

your implementation of go driver.ResourceOffers(e.GetOffers().GetOffers()) does not comply with the contract of the scheduler callbacks: in your impl it's possible that multiple driver callback funcs may be invoked concurrently. this should not happen. in fact the impl in /master not only serializes execution of callbacks but also attempts to preserve the order in which the events arrived from the master (the executor driver impl is lazier w/ respect to ordering but still guarantees serial callback execution).

it seems like your test case attempts to prove that when a scheduler doesn't respond for some period of time that some offers may never be received? the current /next implementation implements back-pressure: the stream of events sent by the master is only processed as fast as the handler impl can take them. so if the handler blocks somewhere (for say 30s) then the event stream might be in some transient state. for example, the event stream buffer could contain offers that are already expired.

there could be some other blocking condition, like a network partition. i'm working on a heartbeat monitor for this. in either case the scheduler must be ready to deal with network / event-stream failures. this is a distributed system after all and failures are pretty much guaranteed at some point.

the key here is to NOT block in the event stream handlers. process events as quickly as possible in the handlers and if there's any blocking work to do then it should be queued and processed outside of the handler loop.

does that make sense, or am I missing something obvious?

jdef commented 7 years ago

considering this resolved. please re-open if needed