Function getting blocked (hung) on canceling the operation with multiple progress bars.

ajayr2512 commented 4 years ago

Issue also referenced by: https://github.com/containers/image/issues/983 Stack trace reveals an issue on: https://github.com/vbauerster/mpb/blob/v5.0.4/progress.go#L103 => https://github.com/vbauerster/mpb/blob/v5.0.4/progress.go#L135 => https://github.com/vbauerster/mpb/blob/v5.0.4/bar.go#L363 => https://github.com/vbauerster/mpb/blob/v5.0.4/bar.go#L161.

Following snippet mainly highlights the pattern of usage. Requirement is to allow canceling the operation in the middle.

func newProgressPool(ctx context.Context) (*mpb.Progress, func()) {
    ctx, cancel := context.WithCancel(ctx)
    pool := mpb.NewWithContext(ctx, mpb.WithWidth(40))
    return pool, func() {
        cancel()
        pool.Wait()
    }
}

func worker(p *mpb.Progress, copyGroup *sync.WaitGroup) {
    defer copyGroup.Done()
    total := 100
    name := "Single Bar:"
    // adding a single bar, which will inherit container's width
    bar := p.AddBar(int64(total),
        // override DefaultBarStyle, which is "[=>-]<+"
        mpb.BarStyle("╢▌▌░╟"),
        mpb.PrependDecorators(
            // display our name with one space on the right
            decor.Name(name, decor.WC{W: len(name) + 1, C: decor.DidentRight}),
            // replace ETA decorator with "done" message, OnComplete event
            decor.OnComplete(
                decor.AverageETA(decor.ET_STYLE_GO, decor.WC{W: 4}), "done",
            ),
        ),
        mpb.AppendDecorators(decor.Percentage()),
    )
    // simulating some work
    max := 100 * time.Millisecond
    for i := 0; i < total; i++ {
        time.Sleep(time.Duration(rand.Intn(10)+1) * max / 10)
        bar.Increment()
    }
}
func ProgBar() {
    // initialize progress container, with custom width
    ctx, cancelFunc := context.WithCancel(context.Background())
    progressPool, progressCleanup := newProgressPool(ctx)
    copyGroup := sync.WaitGroup{}
    defer func() {
        copyGroup.Wait()
        progressCleanup()
    }()
    for i := 0; i < 5; i++ {
        copyGroup.Add(1)
        go worker(progressPool, &copyGroup)
    }
    cancelFunc()  // -> This causes a hang. 
}

vbauerster commented 4 years ago

Thanks for detailed report. This block happened because the code initializes bars asynchronously and cancels the pool at the same time. I've pushed commit which should fix this. Though you're doing something nasty in the code. I mean if it was a car it's like you're pressing both gas and break pedals at the same time. Progress pool will shutdown in this case, but not workers.

Also you can pass copyGroup to progress pool, so single wait on pool will wait both.

func newProgressPool(ctx context.Context, wg *sync.WaitGroup) (*mpb.Progress, func()) {
    ctx, cancel := context.WithCancel(ctx)
    pool := mpb.NewWithContext(ctx, mpb.WithWaitGroup(wg))
    return pool, func() {
        cancel()
        pool.Wait()
    }
}

func ProgBar() {
    // initialize progress container, with custom width
    copyGroup := sync.WaitGroup{}
    ctx, cancelFunc := context.WithCancel(context.Background())
    progressPool, progressCleanup := newProgressPool(ctx, &copyGroup)
    defer progressCleanup()
    for i := 0; i < 5; i++ {
        copyGroup.Add(1)
        go worker(progressPool, &copyGroup)
    }
    cancelFunc()
}

ajayr2512 commented 4 years ago

Thanks ! Do see the fact that no one is stopping the worker here. Just a sample to identify the issue mainly. Do need the workers to listen to ctx.Done as well.

mtrmac commented 4 years ago

@ajayr2512 Thanks for reporting this with a minimal use case!

@vbauerster

Though you're doing something nasty in the code. I mean if it was a car it's like you're pressing both gas and break pedals at the same time. Progress pool will shutdown in this case, but not workers.

What really happens is that the cancellation is caused by a timeout, some time after the pool is created. The workers do receive the same cancellable context, and are also terminated by that — and also block on copyGroup to ensure the workers terminate and don’t access the pool EDIT ~~before~~ after the pool itself is destroyed.

It’s just that the cancellation terminates the per-bar goroutines simultaneously with cancelling the workers, so depending on timing the workers could continue calling the bar APIs before noticing the cancellation. Removing the actual worker cancellation from the code above just makes it certain that they lose the race and reproduce the hang.

vbauerster / mpb

Function getting blocked (hung) on canceling the operation with multiple progress bars. #73