mattermost / mattermost-plugin-gitlab

GitLab plugin for Mattermost
Apache License 2.0
141 stars 85 forks source link

Support config variable to disable project access check for incoming webhook event #463

Open mickmister opened 9 months ago

mickmister commented 9 months ago

At the moment, we ensure a given user has read access to a project when:

There are some issues going on with revoked tokens, causing existing subscriptions to not work in the second case. https://github.com/mattermost/mattermost-plugin-gitlab/issues/411#issuecomment-1927033155

The task here is to implement a plugin config variable "Enable Webhook Project Access Check", which will default to true. When this value is false, we will skip the project access check when we receive a webhook event

mickmister commented 9 months ago

@esarafianou I'm curious what you think about this. There is an ongoing issue we're diagnosing related to revoked tokens, and a symptom https://github.com/mattermost/mattermost-plugin-gitlab/issues/411#issuecomment-1927033155 customers are experiencing from the issue is that webhook subscriptions stop working, since we check that the user that created the subscription has access to the project, every time we receive a new webhook event.

I'm proposing that we have an optional config variable to disable this check. We would keep the check for when the subscription is created, but not for when the webhook events come in. We would make it clear what the security implications are. I think this would help unblock customers that are experiencing the outage of webhook events.

esarafianou commented 9 months ago

@mickmister I'd expect that we first understand what changed.

Is it some gitlab related settings that changed and have the access token expire after X hours/days? Is it something that can be changed in the Gitlab side?

Is there a refresh token mechanism where we can take a new access token if the user's access to a repo is not revoked?

I don't think we should go ahead with the proposal of his issue without understanding the real underlying issue.

esarafianou commented 9 months ago

Regarding the linked comment, I don't understand how a single user being disconnected results in the whole instance stopping to receiving webhooks/update. Specific service accounts need to be used in this case. I don't know what GitLab offers but there should be something like a service account.

mickmister commented 9 months ago

@esarafianou For the token issue, it is a long-standing issue that cannot be explained in a short comment here. There is more context here https://github.com/mattermost/mattermost-plugin-gitlab/pull/308

Is it some gitlab related settings that changed and have the access token expire after X hours/days?

It is a behavioral change in a GitLab release. It's more complex than that, and as we know it, it is sensitive to race conditions of refreshing tokens simultaneously. I don't recall exact release but it was around 1.5 years ago. There are other threads of developers complaining to GitLab about this same change. We have resolved this for most cases with https://github.com/mattermost/mattermost-plugin-gitlab/pull/308 and https://github.com/mattermost/mattermost-plugin-gitlab/pull/408, but it seems it's still happening to some users.

Is it something that can be changed in the Gitlab side?

I don't think so

I don't understand how a single user being disconnected results in the whole instance stopping to receiving webhooks/update

Only subscriptions created by that user are affected. Our plugin attempts to verify if that user still has access to the project, and can't verify if the user's token doesn't work properly. We may be able to do this with a service account (check if the user still has access), though I'm not sure if that significant change (introducing service account logic into the plugin) is worth it for this specific piece of this feature. @hanzei @fmartingr What are your thoughts on this?

fmartingr commented 9 months ago

Is it some gitlab related settings that changed and have the access token expire after X hours/days?

I think Gitlab tokens expire after 12 months by default.

Gitlab - Access token expiration

The ability to opt out of expiring access tokens was deprecated in GitLab 14.3 and removed in 15.0. All existing integrations must be updated to support access token refresh.


I'm proposing that we have an optional config variable to disable this check. We would keep the check for when the subscription is created, but not for when the webhook events come in. We would make it clear what the security implications are. I think this would help unblock customers that are experiencing the outage of webhook events.

I would like to understand how this works. If the user creates a subscription for a project, then loses the access to that given project, do we still receive subscription events? Even if we can't check if the user has access to the project, it should be safe to assume that Gitlab would revoke subscription if a given user loses access to that project.

Only subscriptions created by that user are affected. Our plugin attempts to verify if that user still has access to the project, and can't verify if the user's token doesn't work properly. We may be able to do this with a service account (check if the user still has access), though I'm not sure if that significant change (introducing service account logic into the plugin) is worth it for this specific piece of this feature. @hanzei @fmartingr What are your thoughts on this?

I didn't do any research into that when building the calendar plugins, though it was an option and I think it gives more granularity to the setup handled by the admins. If that's the way to go, there should be a discussion about the impact/cost of allowing both authentication methods in plugins, to be backwards compatible. It should be safe to assume that the two methods are incompatible with each other so if the user switches from one to another all current data would be lost and requires authentication from users again.

esarafianou commented 9 months ago

@esarafianou For the token issue, it is a long-standing issue that cannot be explained in a short comment here. There is more context here https://github.com/mattermost/mattermost-plugin-gitlab/pull/308

@mickmister  The PR you linked to as having more context has 150+ comments. Is there not a summary of the current issue? Articulating well the problem will also help us figure out the best solution. A problem burried in between the context of 150 comments won't help us that much.

as we know it, it is sensitive to race conditions of refreshing tokens simultaneously.

Shouldn't we then address the concurrent token refreshes here? It seems that Gitlab has introduced refresh token rotation (rotating the refresh token after the first use and sending it back along with the new access token) which is the current best practice.

Only subscriptions created by that user are affected. Our plugin attempts to verify if that user still has access to the project, and can't verify if the user's token doesn't work properly. We may be able to do this with a service account (check if the user still has access), though I'm not sure if that significant change (introducing service account logic into the plugin) is worth it for this specific piece of this feature

That's not what I meant. My question/comment was not related to any implementation fix, only that a whole instance stopping to receiving webhooks/update because a single user (not a service account) was deactivated is an unstable usage of our plugin.

mickmister commented 9 months ago

A problem burried in between the context of 150 comments won't help us that much.

@esarafianou Understood, I agree. Apologies for the lack of explanation here. I think the issue itself is indeed not too complicated. Here is the main error message we get from the GitLab API when a token is invalid:

"error":"unable to get the new refreshed token: oauth2: cannot fetch token: 400 Bad Request\\nResponse: {\\"error\\":\\"invalid\_grant\\",\\"error\_description\\":\\"The provided authorization grant is invalid, expired, revoked, does not match the redirection URI used in the authorization request, or was issued to another client.\\"}

The provided authorization grant is invalid, expired, revoked, does not match the redirection URI used in the authorization request, or was issued to another client. We cannot know for sure, but the error case that seems to be occurring is the last one listed: "The provided authorization grant ... was issued to another client", which implies we may have an issue with concurrent token refreshes.


Shouldn't we then address the concurrent token refreshes here? It seems that Gitlab has introduced refresh token rotation (rotating the refresh token after the first use and sending it back along with the new access token) which is the current best practice.

This summarizes the behavior well. Before this issue was introduced, we were not doing anything to avoid this concurrent refresh issue. The concurrent token refresh issue is handled in the PR https://github.com/mattermost/mattermost-plugin-gitlab/pull/308, via a HA-aware mutex provided by the pluginapi/cluster package, which is being used in the GitLab plugin in this function here:

https://github.com/mattermost/mattermost-plugin-gitlab/blob/387d84d85c714a063cafc7afa51444d2e3bc3922/server/plugin.go#L820

My question/comment was not related to any implementation fix, only that a whole instance stopping to receiving webhooks/update because a single user (not a service account) was deactivated is an unstable usage of our plugin.

I want to make sure I have your comment understood. By "unstable usage of our plugin" do you mean creating every subscription as a single user is an unstable usage? Is the service account the proposed solution, or are you saying there is no solution on our end?


We have not been able to reproduce this issue outside of what a small number of external reports are still occurring. There have been efforts to produce a debug build with verbose logging to help shed light on in what conditions this issue is occurring for them, though this has not been given to the admins yet. I'm also trying to get more people to connect their accounts on hub to get more realistic coverage on this https://hub.mattermost.com/private-core/pl/sjwr1ig9j3f13chrjdbnrtmrzw. I was mainly thinking of changing the webhook subscription ownership issue as a means to mitigate the symptom of loss of event updates from GitLab.

mickmister commented 9 months ago

If the user creates a subscription for a project, then loses the access to that given project, do we still receive subscription events. Even if we can't check if the user has access to the project, it should be safe to assume that Gitlab would revoke subscription if a given user loses access to that project.

@fmartingr The webhooks are made by GitLab admins in the GitLab UI, and are not specific to connected accounts on Mattermost. GitLab is continuing to send events, since a MM account getting disconnected is not anything related to the webhook.

fmartingr commented 9 months ago

If the user creates a subscription for a project, then loses the access to that given project, do we still receive subscription events. Even if we can't check if the user has access to the project, it should be safe to assume that Gitlab would revoke subscription if a given user loses access to that project.

@fmartingr The webhooks are made by GitLab admins in the GitLab UI, and are not specific to connected accounts on Mattermost. GitLab is continuing to send events, since a MM account getting disconnected is not anything related to the webhook.

Apologies, I though that subscriptions were created by the users from the plugin. If that's controlled from the Gitlab side, why do we check channel permissions?

mickmister commented 9 months ago

@fmartingr I'm not sure what you mean by "channel permissions"

The webhook is set up by a GitLab admin, so the webhook is then essentially a firehose of all projects it is configured for. On our end when a subscription is made, it is tied to a given project, so our plugin checks that a given user creating a MM-side subscription has access to the project that they are subscribing to. It also checks their access whenever a webhook event comes in, which this issue was suggesting we make optional.

fmartingr commented 9 months ago

@mickmister Understood now, I was a bit lost on how the data was handled in this interactions. If that's the case and the user creates a subscription but the data comes from this firehose webhook, I'm against removing the check. If I'm still mistaken in how all this works together I would need to review the code before making any more comments 😅

Kshitij-Katiyar commented 3 months ago

Hey @wiggin77, Should we proceed with implementing Mickmister's suggestion? Introducing a plugin configuration variable, EnableWebhookProjectAccessCheck, will necessitate adding a new slash command to update the value of this configuration variable.

Just to confirm, the check we plan to bypass based on the configuration variable is the one linked here, correct?

wiggin77 commented 3 months ago

Should we proceed with implementing Mickmister's suggestion?

@Kshitij-Katiyar No, I don't think a case has been made to turn off this check to fix a problem we don't thoroughly understand.