moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.14k stars 1.15k forks source link

Potential deadlock or goleak in util/progress/progress.go#157 #4693

Open xuxiaofan1203 opened 7 months ago

xuxiaofan1203 commented 7 months ago

Like the test in packege flightcontrol below:

func TestNoCancel(t *testing.T) {
    t.Parallel()
    g := &Group[string]{}
    eg, ctx := errgroup.WithContext(context.Background())
    var r1, r2 string
    var counter int64
    f := testFunc(100*time.Millisecond, "bar", &counter)
    eg.Go(func() error {
        ret1, err := g.Do(ctx, "foo", f)
        if err != nil {
            return err
        }
        r1 = ret1
        return nil
    })
    eg.Go(func() error {
        ret2, err := g.Do(ctx, "foo", f)
        if err != nil {
            return err
        }
        r2 = ret2
        return nil
    })
    err := eg.Wait()
    assert.NoError(t, err)
    assert.Equal(t, "bar", r1)
    assert.Equal(t, "bar", r2)
    assert.Equal(t, counter, int64(1))
}

When executes this code segment

eg.Go(func() error {
        ret1, err := g.Do(ctx, "foo", f)
        if err != nil {
            return err
        }
        r1 = ret1
        return nil
    })

if err=nil, the cancel in Go doesn't execute.Then g.Do() executes, and the call chains are g.do() -> newcall() -> go c.progressState.run(pr) -> pr.Read() -> Wait() in Read(): due to cancel doesn't execute, and pr.cond.Broadcast() doesn't execute as well. As a result, there isn't a signal() or broadcast() can awaken the Wait()

Broadcast() in function Read and in function pipe are be blocked at <-ctx.Done()
go func() {
        select {
        case <-done:
        case <-ctx.Done():
            pr.mu.Lock()
            pr.cond.Broadcast()
            pr.mu.Unlock()
        }
}()
go func() {
        <-ctx.Done()
        pr.mu.Lock()
        pr.cond.Broadcast()
        pr.mu.Unlock()
}()
tonistiigi commented 7 months ago

Hard to understand what case you are really pointing to. Can you provide an example(you can add sleeps if you want to cause a specific race) or propose a fix.

imeoer commented 5 months ago

A similar problem occurs when the push.Push method is called concurrently, not sure if it is related to us using the same context.Context between multiple goroutines.