maxlerebourg / crowdsec-bouncer-traefik-plugin

Traefik plugin for Crowdsec - WAF and IP protection
Apache License 2.0
260 stars 14 forks source link

[FEATURE] Allow users to specify plugin behavior on cache refreshing failure #152

Closed darkweaver87 closed 6 months ago

darkweaver87 commented 7 months ago

Is your feature request related to a problem? Please describe. 🐛 We implemented a PoC with your wonderful plugin and we would like to put it in production but we still have one remaining issue using stream mode (but using other mode don't change anything).

Crowdsec free deployment relies on some agents sending their decisions to a local API. This LAPI can't be scaled by design as this will mean agents will potentially try to send their data to an LAPI they are not registered on.

Consequently, this means that's technically speaking we can "lose" the LAPI for a given amount of time and it can be unavailable during the cache refresh. If it's the case then Traefik returns a 403.

Even if I tend to agree that it's a good security practice to block when their is a doubt on some services that's not really ideal. In my case, I need to allow users to access the service on such a failure.

Describe the solution you'd like

Thus, I was thinking about either:

I will be happy to contribute, just let me know your thoughts on this :-)

Additional context

mathieuHa commented 7 months ago

Hi,

Thanks for the interest in the plugin, we're discussing the issue you encountered with @maxlerebourg.

l280 bouncer.go

        // Right here if we cannot join the stream we forbid the request to go on.
    if bouncer.crowdsecMode == configuration.StreamMode || bouncer.crowdsecMode == configuration.AloneMode {
        if isCrowdsecStreamHealthy {
            handleNextServeHTTP(bouncer, remoteIP, rw, req)
        } else {
            bouncer.log.Debug(fmt.Sprintf("ServeHTTP isCrowdsecStreamHealthy:false ip:%s", remoteIP))
            handleBanServeHTTP(bouncer, rw)
        }
    } 

I'm thinking about an internal counter, that allows X number of time the stream to be unhealthy before going to 403 requests.
So the updateInterval multiplied by the counter, would allow that grace period.

With some default variable exemple:
streamUnhealthyMaxTime=3
UpdateIntervalSeconds=60

So instead of blocking at 1 min if the LAPI is unreacheable, it would be blocked after 3 min.
A successfull sync with the LAPI would reset that counter

darkweaver87 commented 6 months ago

Hello,

Thank you for your feedback :-) Looks good to me :-)

Thanks :+1:

Rémi

mathieuHa commented 6 months ago

Hi,

We're almost done implementing it, I have tested basic behavior yesterday:

We should merge and release a beta version very soon.