open-feature / flagd

A feature flag daemon with a Unix philosophy
https://openfeature.dev
Apache License 2.0
564 stars 67 forks source link

feat: add 'watcher' interface to file sync #1365

Closed djosephsen closed 3 months ago

djosephsen commented 4 months ago

Implement fsnotify and os.Stat based watchers

fixes: #1344

This PR

Intent of this PR is to begin a conversation about fixing #1344. The approach taken is to replace the current use of fsontify.Watcher with a local Watcher interface type that describes the fsnotify.Watcher interface.

My original take was to use fsnotify.Watcher directly as an implementation of local Watcher, but fsnotify's Watcher directly exposes its Error and Event channels, making it impossible to describe with an interface, so I had to create a small wrapper for fsnotify.Watcher to satisfy the new Watcher interface (this is fsnotify_watcher.go). From there, we implement the Watcher interface again, this time using os.Stat and fs.FileInfo (this is fileinfo_watcher.go).

Then we change the filepath sync code to use an interface to Watcher, rather than fsnotify.Watcher directly. The new fileinfo watcher plugs right in, and nothing really needs to change in the sync.

If yall are favorable to this approach, I'll finish wiring up configs, and flesh out the tests. I didn't want to go much further without some buy-in or feedback.

Related Issues

Fixes #1344

Notes

See bullet-points above

How to test

go test -v ./...

netlify[bot] commented 4 months ago

Deploy Preview for polite-licorice-3db33c ready!

Name Link
Latest commit df3f9bc36d0e59acf15c87f9980596b7e47a1efe
Latest deploy log https://app.netlify.com/sites/polite-licorice-3db33c/deploys/66c51ae83276770008e19405
Deploy Preview https://deploy-preview-1365--polite-licorice-3db33c.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

beeme1mr commented 4 months ago

Hey @djosephsen, this seems like a good approach to me. @toddbaert @thisthat @Kavindu-Dodan do you have any concerns?

thisthat commented 4 months ago

Thanks @djosephsen for the awesome contribution, I am in favor of having this 👍 Regarding the Windows issue, I think we should add a parallel job for the integration tests and run them on a Windows runner (See https://github.com/open-feature/flagd/blob/main/.github/workflows/build.yaml#L115)

toddbaert commented 4 months ago

I agree with this approach! Thanks @djosephsen .

I think the only thing I'm not sure about is how this configuration should be selected... Should we make it a "global" flag so that all file-watchers use a particular implementation? Should it allow it to be configured per-file-sync object?

I think I favor the former... I think if someone wants to use a certain implementation, they will want to use it for all watchers, so a single global config makes sense. I also wonder if there's some way we can default to use this mode outside of K8s. Is there an "approved" means of knowing if you're running in K8s? Some env-var? The watcher as implemented works great there, since K8s behaves in very predictable ways wrt config-maps, but outside of that I think the new implementation will be more reliable (albeit a bit less performant). cc @thisthat @heckelmann on that last question.

djosephsen commented 4 months ago

I agree with this approach! Thanks @djosephsen .

I think the only thing I'm not sure about is how this configuration should be selected... Should we make it a "global" flag so that all file-watchers use a particular implementation? Should it allow it to be configured per-file-sync object?

SGTM. Honestly making it a per-watcher switch hadn't even occurred to me.

I think I favor the former... I think if someone wants to use a certain implementation, they will want to use it for all watchers, so a single global config makes sense. I also wonder if there's some way we can default to use this mode outside of K8s. Is there an "approved" means of knowing if you're running in K8s? Some env-var? The watcher as implemented works great there, since K8s behaves in very predictable ways wrt config-maps, but outside of that I think the new implementation will be more reliable (albeit a bit less performant). cc @thisthat @heckelmann on that last question.

I don't know how officially approved this is, but I've used KUBERNETES_SERVICE_HOST in the past to detect whether the process is running in k8s. I suppose the bullet-proof way to do it would be to use the client lib...

djosephsen commented 3 months ago

Just heads-up before this merges I should really flesh out fileinfo_watcher_test.go.. I feel like this is under-tested

beeme1mr commented 3 months ago

Just heads-up before this merges I should really flesh out fileinfo_watcher_test.go.. I feel like this is under-tested

Sure, no problem. Please let us know when you're ready so we can rereview. Thanks!

djosephsen commented 3 months ago

ok yall.. tests are fleshed out. Sorry that took a minute (writing tests is not my favorite thing). I have a question about how we want to handle the Error and Event channels given by the watchers. Currently we are using an unbuffered fsnotify .Watcher , but there is an upstream buffered option.

Do we want to default to using unbuffered channels in the fileinfo watcher? My personal feeling is that buffered channels with sane defaults (size=32?) are pretty cheap, generally safer and less likely to lock the runtime in unexpected ways, unless you're genuinely holding something wrong. For example, In cases like the tests, where you're just kind of flexing the code-paths, unbuffered channels can create headaches.

ATM I have silly values for both buffer sizes, and will change them to whatever yall think, at which point I think we're ready to merge pending your review.

djosephsen commented 3 months ago

hmm, I just noticed I forgot the DCO again. Rebase is giving merge conflicts, and when I resolve them, giving me a commit message that looks like I might be overwriting someone else's history.. Do I need to close this and re-submit?

(sorry my bad. I'll git config --global commit.gpgsign true)

beeme1mr commented 3 months ago

Hey @djosephsen, you should be able to run the following command:

In your local branch, run: git rebase HEAD~10 --signoff

Force push your changes to overwrite the branch: git push --force-with-lease origin main

We'll squash merge ovce the pr is ready.

beeme1mr commented 3 months ago

Dco is a bit of a pain. We're looking into switching to a cla to simplify this process.

djosephsen commented 3 months ago

Hey @djosephsen, you should be able to run the following command:

In your local branch, run: git rebase HEAD~10 --signoff

Force push your changes to overwrite the branch: git push --force-with-lease origin main

We'll squash merge ovce the pr is ready.

The first time I tried it went sideways, but the second time I was able to use some switch to skip an upstream merge, and that did it.

toddbaert commented 3 months ago

@djosephsen if you want, you can approve or :+1: :-1: here and you'll get an org invite. Then your CI tests will run automatically. There's no obligations involved in joining, but it's the first step on the contributor ladder.

Kavindu-Dodan commented 3 months ago

@djosephsen there're few lint issue due to unwrapped errors [1]. And you can run lint validations locally with make lint, so you know next run will be greet :)

[1] - https://github.com/open-feature/flagd/actions/runs/10406956145/job/28834819193?pr=1365#step:5:871

djosephsen commented 3 months ago

@djosephsen there're few lint issue due to unwrapped errors [1]. And you can run lint validations locally with make lint, so you know next run will be greet :)

[1] - https://github.com/open-feature/flagd/actions/runs/10406956145/job/28834819193?pr=1365#step:5:871

Hey hey... I think it might be idiomatically incorrect to wrap those errors. fsnotify_watcher is an interface wrapper, so by definition it doesn't add context to the upstream error.

Totally happy to wrap them if ya want. But I think the correct move is probably to //nolint:wrapcheck those methods. Take a look and lemme know what yall think.

Kavindu-Dodan commented 3 months ago

@djosephsen there're few lint issue due to unwrapped errors [1]. And you can run lint validations locally with make lint, so you know next run will be greet :) [1] - https://github.com/open-feature/flagd/actions/runs/10406956145/job/28834819193?pr=1365#step:5:871

Hey hey... I think it might be idiomatically incorrect to wrap those errors. fsnotify_watcher is an interface wrapper, so by definition it doesn't add context to the upstream error.

Totally happy to wrap them if ya want. But I think the correct move is probably to //nolint:wrapcheck those methods. Take a look and lemme know what yall think.

Yeah it doesn't add context to the upstream, but by wrapping we get more context for downstream (ex:- log analysis). I think this section of the checker explains why wrapping is recommended [1].

[1] - https://github.com/tomarrell/wrapcheck?tab=readme-ov-file#why

djosephsen commented 3 months ago

@djosephsen there're few lint issue due to unwrapped errors [1]. And you can run lint validations locally with make lint, so you know next run will be greet :) [1] - https://github.com/open-feature/flagd/actions/runs/10406956145/job/28834819193?pr=1365#step:5:871

Hey hey... I think it might be idiomatically incorrect to wrap those errors. fsnotify_watcher is an interface wrapper, so by definition it doesn't add context to the upstream error. Totally happy to wrap them if ya want. But I think the correct move is probably to //nolint:wrapcheck those methods. Take a look and lemme know what yall think.

Yeah it doesn't add context to the upstream, but by wrapping we get more context for downstream (ex:- log analysis). I think this section of the checker explains why wrapping is recommended [1].

[1] - https://github.com/tomarrell/wrapcheck?tab=readme-ov-file#why

Okeedokee, pr inbound

beeme1mr commented 3 months ago

I'm seeing a panic now when trying to run this locally.

Error:

2024-08-19T12:35:15.559-0400    info    file/filepath_sync.go:62        Starting filepath sync notifier {"component": "sync", "sync": "fileinfo"}
panic: assignment to entry in nil map

goroutine 1 [running]:
github.com/open-feature/flagd/core/pkg/sync/file.(*fileInfoWatcher).Add(0xc00021ca80, {0xc0005b0360, 0x2a})
        /home/beemer/code/openfeature/flagd/core/pkg/sync/file/fileinfo_watcher.go:69 +0xfe
github.com/open-feature/flagd/core/pkg/sync/file.(*Sync).Init(0xc0005e1f20, {0x2123020, 0xc0003aa550})
        /home/beemer/code/openfeature/flagd/core/pkg/sync/file/filepath_sync.go:78 +0x22a
github.com/open-feature/flagd/flagd/pkg/runtime.(*Runtime).Start(0xc0005b40f0)
        /home/beemer/code/openfeature/flagd/flagd/pkg/runtime/runtime.go:75 +0x2b5
github.com/open-feature/flagd/flagd/cmd.init.func1(0xc0005a2200?, {0x1df5f60?, 0x4?, 0x1df5f64?})
        /home/beemer/code/openfeature/flagd/flagd/cmd/start.go:146 +0x8e5
github.com/spf13/cobra.(*Command).execute(0x30cb160, {0xc0003a7bf0, 0x3, 0x3})
        /home/beemer/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989 +0xab1
github.com/spf13/cobra.(*Command).ExecuteC(0x30cae80)
        /home/beemer/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
        /home/beemer/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041
github.com/open-feature/flagd/flagd/cmd.Execute({0x1df5788?, 0xc0000ac050?}, {0x1df5e24?, 0x21008a0?}, {0x1df9193?, 0x4014e0?})
        /home/beemer/code/openfeature/flagd/flagd/cmd/root.go:37 +0x91
main.main()
        /home/beemer/code/openfeature/flagd/flagd/main.go:30 +0xd4
exit status 2
make: *** [Makefile:55: run-flagd] Error 1
djosephsen commented 3 months ago

I saw that too.. I'll get it tracked down

djosephsen commented 3 months ago

How we lookin? I can run the binary locally right now, but trying to run the integration tests, I don't have any of the files referenced by those commands.. like test-harness/flags/testing-flags.json and etc... Looks like the integration tests, as run by the CI are working though.

beeme1mr commented 3 months ago

Hey @djosephsen, I'll manually test it real quick. If all looks good, I think we should merge. FYI @toddbaert

beeme1mr commented 3 months ago

It works great locally. Thank you very much @djosephsen. 🥳