rjeczalik / notify

File system event notification library on steroids.
MIT License
902 stars 128 forks source link

FSEvents: lock the CF run loop goroutine to its thread #140

Closed fjl closed 6 years ago

fjl commented 6 years ago

CoreFoundation run loops are tied to a particular thread. The goroutine that sets up the run loop may get rescheduled on a different thread just after adding the source. In that case the run loop that's started wouldn't have any sources and thus exit immediately.

Fix this by locking the goroutine to its thread. The thread is wasted on the run loop anyway, so we might as well claim it earlier.

Should help with #139

fjl commented 6 years ago

Random question that popped up while reading the code: Why does gosource sleep for one second before signaling that the run loop is ready?

https://github.com/rjeczalik/notify/blob/e1801ee34d54c4bf45126ef51fe9750b4e42bee8/watcher_fsevents_cgo.go#L75-L78

IMHO that's not required.

rjeczalik commented 6 years ago

IMHO that's not required.

Agreed, it's embarrassing but the sleep is a leftover from some debugging session.

fjl commented 6 years ago

I'm still trying to figure out how to reproduce #139. It happens on our Travis build, but I can't make it panic locally.

rjeczalik commented 6 years ago

It happens on our Travis build, but I can't make it panic locally.

Does a C equivalent program also crash? It could be a builder issue (FSEvents disabled with some .fsevents setting, just guessing).

fjl commented 6 years ago

I can reproduce by applying this diff:

@@ -26,6 +26,7 @@ import "C"
 import (
    "errors"
    "os"
+   "runtime"
    "sync"
    "sync/atomic"
    "time"
@@ -65,6 +66,7 @@ func init() {
    go func() {
        runloop = C.CFRunLoopGetCurrent()
        C.CFRunLoopAddSource(runloop, source, C.kCFRunLoopDefaultMode)
+       runtime.Gosched()
        C.CFRunLoopRun()
        panic("runloop has just unexpectedly stopped")
    }()

Then:

go get golang.org/x/tools/cmd/stress
go test -c
stress -p 8 ./notify.test -test.run=-
karalabe commented 6 years ago

With this PR, I can't repro the failure on Travis any more. That being said, previously I saw many failures and after the last run before this PR only 1, so it may also depend on the load on Travis.

From my perspective we can consider that this fixes the issue and then come back if it it reappears :)

rjeczalik commented 6 years ago

With this PR, I can't repro the failure on Travis any more.

Oh, I understood from https://github.com/rjeczalik/notify/pull/140#issuecomment-354991816 that the original issue can't be reproduced locally with or without this patch.

From my perspective we can consider that this fixes the issue and then come back if it it reappears :)

👍