tcplugins / tcWebHooks

WebHooks plugin for Teamcity. Supports many build states and payload formats.
https://netwolfuk.wordpress.com/category/teamcity/tcplugins/tcwebhooks/
154 stars 30 forks source link

Configured webhooks break when kotlin code syncs #246

Open jacobs-xero opened 1 month ago

jacobs-xero commented 1 month ago

Expected Behavior

Configured webhooks should be configured correctly and editable.

Current Behavior

Configured webhooks produce an error when trying to edit

An unexpected error occured. Please see your browser's javascript console.
Responding with error, status code: 404 (Not Found).
Details: jetbrains.buildServer.server.rest.errors.NotFoundException: Could not find a webhook with that id
Could not find the entity requested. Check the reference is correct and the user has permissions to access the entity.

Configured webhook has lost all it's settings

Screenshot 2024-07-18 at 11 39 26 AM

Under Versioned Settings, Loading project settings from VCS can re-create the webhooks with the correct settings. However leaves these "ghost" webhooks to persist until a service restart.

Extent of the breaking varies If a webhook is defined in code and that definition is not altered, it stays in place and can be edited, but it loses some configuration and needs to be fixed If a webhook is not defined in code (eg for subprojects, because we have no way of defining that in code), then the webhook just becomes uneditable

Steps to Reproduce (for bugs)

  1. Make a kotlin change, unrelated to webhooks
  2. Wait for Versioned Settings to synchronise and apply changes
  3. Webhooks break

Your Environment

netwolfuk commented 1 month ago

Thanks for the great bug report @jacobs-xero

I'll setup a kotlin VCS project and try to reproduce it. Can I just clarify a couple of things:

  1. Do you have the tcWebHooks REST API plugin installed? If so, which one (legacy or new).
  2. How did you access the edit dialog above? From the /webhooks/index.html?projectId=blah or from editing the configuration on the project?

Thanks in advance.

netwolfuk commented 1 month ago

If a webhook is not defined in code (eg for subprojects, because we have no way of defining that in code), then the webhook just becomes uneditable

Can I please get a bit more info on what you see when it becomes "uneditable"? I'm hoping you might get some errors in the browser console (developer tools).

Can you also confirm my assumption here?

  1. A project has a webhook configured, and has sub-projects enabled.
  2. A sub-project has VCS kotlin enabled and inherits webhooks from the parent project. These inherited webhooks are uneditable.
netwolfuk commented 1 month ago

Hi @jacobs-xero. I have not been able to reproduce this issue. Are you able to answer the questions I had posted above?

jacobs-xero commented 1 month ago

@netwolfuk sorry for the delay in getting back. I'll find the answers for the questions above and get back to you asap. Thanks

netwolfuk commented 1 month ago

Thanks @jacobs-xero There is a similar conversation happening on issue #248 Can you answer these questions too please?

  1. Do you have TeamCity setup with two nodes? Or just a single node?
  2. Are you allowing edits in TeamCity to be committed back to VCS? eg, Allow editing project settings via UI is turned on.
  3. In the plugin-settings.xml file, do your webhooks have id values set? eg, <webhook id="id_552375210" url=...
jacobs-xero commented 1 month ago
  1. A project has a webhook configured, and has sub-projects enabled. Yes, correct.

  2. A sub-project has VCS kotlin enabled and inherits webhooks from the parent project. These inherited webhooks are uneditable. It's the parent project that has VCS kotlin enabled (versioned settings) and sub projects inherit the webhooks from parent project. The parent project webhooks are uneditable.

  3. Do you have TeamCity setup with two nodes? Or just a single node? We only have a single node

  4. Are you allowing edits in TeamCity to be committed back to VCS? eg, Allow editing project settings via UI is turned on. Allow editing project settings via UI is turned on.

  5. In the plugin-settings.xml file, do your webhooks have id values set? eg, <webhook id="id_552375210" url=... The webhooks do have an id within the plugin-settings.xml file.

netwolfuk commented 1 month ago

Thank you for the quick feedback. I'll add that to the array of facts I have been collecting. I am about to post an update to the other ticket (#248 ) as to what I have been able to reproduce.

jacobs-xero commented 1 month ago

Thanks @netwolfuk , is the advice to use id in the webhook definition? I believe our webhooks contain this already, does it matter if it follows the id_552375210 convention or would something like you mentioned in the other ticket FooBarPRDevOps_Webhook_1 fair better? Thanks

netwolfuk commented 1 month ago

HI @jacobs-xero. Having an ID is required. It does not matter which format you choose. The most important thing is that they don't clash across the whole of the teamcity instance.

If I understand correctly, your VCS is being written to by TeamCity. This means that when the webhook is loaded it is...

  1. Checked that an ID is present. If not, one is allocated.
  2. Checked that the ID does not clash. If so, a new ID is generated.
  3. In your case, it should then be written back to the VCS and you won't have this issue again.

For ticket #248 the same is happening, but because they don't let TeamCity update back into VCS, the issue occurs on every update.

So, for them the benefit is that using an id like FooBarPRDevOps_Webhook_1 allows them to manage uniqueness better, since the only place that can be managed is in their XML file before they commit it.

At this stage, these two tickets look very similar, but I think are actually different bugs. I am hoping to get some time on the weekend to do a deep dive on the issue you are seeing.

I suspect in both cases it is exacerbated by the nested projects that inherit from a VCS managed project, but I have yet to prove that.

netwolfuk commented 1 month ago

Thinking out loud, if...

  1. The file has changed and I need to load the webhooks
  2. AND
  3. I make sure the all the webhooks in the cache have a matching webhook from the config file
  4. AND
  5. I remove any from the cache that are not in the config file

Then the cache will stay in sync with the config.

The IDs will be constantly changing which does have two at least two downsides...

  1. The IDs on the webhooks page will become out of date, and the page will need to be refreshed if it was loaded before the update occurred.
  2. The history will refer to an old webhook id, and clicking the link on the history page will not be able to find the corresponding config.

I think I will also look into creating a task that checks that webhooks have IDs in the file. I can put up one of those notification bars along the top of the window to indicate there is an issue and that one should fix the IDs in the config file.

Also, I've started investigating whether I can support webhook config inside the settings.kts file. If that's not too hard, then I can apply logic at the time that file is loaded and mandate that they need IDs and that they don't clash.

netwolfuk commented 1 month ago

I've managed to make some really good progress on this. I have found the following:

Updates to webhooks that are driven by the plugin work well The cache is maintained correctly because the changes are applied to the settings, and then the cache is updated.

Updates driven by VCS changes without IDs cause ghost webhooks This is because the existing webhooks are not updated because they are not found due to the ID being regenerated.

Also, I was not comparing the result of the existing webhooks against the updated webhooks. The only predictable way to compare them is by ID, but newly loaded webhooks will be assigned an ID if they don't have one defined. This will not match the existing one, so the old webhook is not updated. A new one is created.

I have added logic to compare the IDs after the cache is refreshed, and remove any webhooks from the current project if they don't match one of the new IDs. A line like the following is printed in the log.

[2024-08-05 09:16:35,341] INFO - jetbrains.buildServer.SERVER - WebHookSettingsManagerImpl :: Found the following webhooks in the cache that no-longer map to webhooks in project 'VcsProjectTest'. WebHooks with ids '[id_377052579, id_283758663]' will be removed from the cache.

This does have the downside that IDs will churn. This causes two annoyances.

  1. The history page will link webhook events with webhooks that might no longer be relevant.
  2. If the edit webhooks page is loaded, and then the webhooks change, the edit will fail because the webhook cannot be found.

Ideally, webhooks should have unique IDs. This will work around this issue.

TeamCity's buildTypeUnregistered event is poorly named In 2.0.1, I added a listener for the buildTypeUnregistered event. This allows me to cleanup webhooks that might be interested in a build that has been deleted.

However, I have found that this event is fired when the VCS configuration is reloaded. TeamCity appears to re-read the VCS settings, delete all builds on that project, and then recreate them with the same internal and external IDs. At which point the buildTypeUnregistered fires for each build.

When I see that event, I diligently clean up any webhooks that explicitly reference that build, and remove that build (btxxxx) id from their configuration.

This presents as webhooks having 0 builds registered. Sometimes webhooks still have builds showing. This is because a project edit (either via VCS or from the UI) fires up to 10 events in quick succession. I have a queue backed delay and de-bounce algorithm to only present one of each type of event.

I therefore only see one buildTypeUnregistered and one projectRestored, but they will fire at almost the same time. At which point a race condition exists as to whether the cache is refreshed while the build is marked as deleted or has been "undeleted".

I have removed the buildTypeUnregistered event listener, and builds now reliably show for all webhooks. I will do some further testing to make sure that not listening for this event has no problematic side-effects.

I'll do more testing on various configurations. I hope to have a very early beta release of 2.0.2 available for testing in 24 hours.

netwolfuk commented 1 month ago

Builds on this branch are available to download on the jetbrains TeamCity instance. You don't need an account to download. Just choose guest mode.

Would it be possible to install this on your server and test it? I am pretty confident this will fix the issues you are seeing. If you do see further issues, I would appreciate any information you could share.

Thanks again for all your help thus far.

jacobs-xero commented 1 month ago

Hi @netwolfuk , thanks for the updated plugin to test. It looks like this 2.0.2 version has fixed the issue of ghost webhooks being created on VCS load/sync.

I did notice this behaviour. When the VCS would load, 1 webhook (always appeared to be the last on the list, but not confirmed if this is a pattern yet) would be missing Trigger on Events.

Screenshot 2024-08-09 at 2 03 05 PM

However, when I would click edit, the WebHook Config was as expected.

Screenshot 2024-08-09 at 2 03 26 PM

When I would hit refresh on the Webhooks & Templates page, the Build Events would appear correct and as expected.

Screenshot 2024-08-09 at 2 03 55 PM

I'll keep having a play around to see if I find any other strange behaviours.

Thanks

netwolfuk commented 1 month ago

Ooh. That's an interesting bug. I don't think I've done anything that would have changed that.

Thanks for reporting it and I'll do some testing over the weekend.

Thanks again for all your great replies. I very much appreciate it.

netwolfuk commented 2 weeks ago

Hi @jacobs-xero On the weekend I have built up two new servers with the intention of running them as a primary/secondary node setup.

Do you happen to be running a two server TeamCity setup? This is what we're seeing on #248 and it seems to be interfering.

Also, I noticed that editing project that is backed by VCS is unreliable. I tried to fix some build steps and my changes appeared to be saved, but were never persisted, and were never written back to git. Have you seen this behaviour, or do you mainly edit the project from the kotlin files and sync from VCS to TeamCity rather than the other way around?

jacobs-xero commented 2 weeks ago

Hi @netwolfuk , we only have the single server TeamCity setup.

We mainly edit the project via Kotlin files and sync from VCS to TeamCity. That's when we started seeing similar inconsistency with what was in the code vs UI (ghost webhooks and unchecked projects etc)

netwolfuk commented 2 weeks ago

Hi @jacobs-xero Did you happen to notice if the webhooks fired on the events that were missing.

What I mean is... If the above build ran, did the webhook trigger for build started?

Looking at the code, these are resolved just at the page is rendered, and just as the webhook runs, so there is a chance that the webhook will have picked up the correct setting when it fired.

You could have a look in the audit log. That would tell you when VCS settings reloaded, and then have a look at the build overview page (not a specific build) on the WebHooks Tab. It should show the events it tried to trigger for (see the bottom of this screenshot).

image

The table at the bottom of the above screenshot is rendered from the last 10,000 webhook events, which are then filtered to show just the ones from that build configuration. So if there have been lots of webhooks fired on other builds, then they might not show up in that table.

jacobs-xero commented 1 week ago

Hi @netwolfuk , I checked in with the team, and they've said that when the Build Events are missing, the webhooks stop firing.