valyala / fasthttp

Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than net/http
MIT License
21.94k stars 1.76k forks source link

Fix possible race condition on request ctx done #1662

Closed peczenyj closed 4 months ago

peczenyj commented 1 year ago

Hello

Few days ago I find this tweet about fasthttp and go-fiber:

https://twitter.com/davidnix_/status/1720454052973044188?s=48

I try to contact davidnix_ to find more information about this, since I use fasthttp on my projects and race condtion is something that I'd like to avoid in a production environment.

He did not provide me any runnable example, I think he delete or lost the original code, but I ask him to add an issue anyway - since I try to reproduce this issue without success.

Based on the screenshots, I think I find the root cause: we assign the done channel to nil and if we are using it in some other goroutine such as using Done() method, this may trigger the race condition, but I can't write a unit test that fails with or without the race condition detector.

If @DavidNix himself wants to add some contribution I think it will be helpful

Regards

DavidNix commented 1 year ago

The data race isn't with closing the channel. IIRC, it's with a struct field that holds a reference to a custom context.Context. I'll try to reproduce it in a few days.

IMO storing a context in a struct field is a design smell and discouraged.

From https://pkg.go.dev/context, they explicitly warn against it:

Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it.

peczenyj commented 1 year ago

The data race isn't with closing the channel.

No but after closing the channel, it was set to nil to prevent closing it again I think.

This is one possible source of race conditions.

if you look at the screenshot, there is a write on server.go line 1916 and read on server.go line 2728

by adding an once function, we can stop the original writing by never set this to nil.

however, this is not the end of story, and this is more deeply, linked to design itself, etc. Perhaps the adaptation of *fasthttp.RequestCtx to be a context.Context had some unexpected side effects, etc.

peczenyj commented 1 year ago

Maybe this Fix #1663

MaxBreida commented 11 months ago

Hey, we also receive sometimes a panic in our services with the following stacktrace, could this be related to this issue?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x560 pc=0xa5d08d]

goroutine 5095 [running]:
github.com/valyala/fasthttp.(*RequestCtx).Done(...)
    /home/vsts/go/pkg/mod/github.com/valyala/fasthttp@v1.50.0/server.go:2728
context.(*cancelCtx).propagateCancel.func2()
    /opt/hostedtoolcache/go/1.21.4/x64/src/context/context.go:506 +0x44
created by context.(*cancelCtx).propagateCancel in goroutine 3836
    /opt/hostedtoolcache/go/1.21.4/x64/src/context/context.go:504 +0x395
nickajacks1 commented 10 months ago

I understand that the channel may be set to nil to prevent a panic, but is it really valid to call Shutdown twice in the first place? Couldn't it instead be stated that users shall not call Shutdown more than once? Then the race could be avoided.