tcplugins / tcWebHooks

WebHooks plugin for Teamcity. Supports many build states and payload formats.
https://netwolfuk.wordpress.com/category/teamcity/tcplugins/tcwebhooks/
157 stars 30 forks source link

WebHook Logging screen not working #161

Closed tanyabrytkova closed 4 years ago

tanyabrytkova commented 4 years ago

Hi! I have a user defined webhook JSON template which has no default build event template, but build successful and build failed events configured.

WebHook with this custom template is enabled for a build configuration for build successful and build failed events, and it works correctly (sends the needed requests on the needed events).

But I see no logs or proofs of this in WebHook Logging screen neither in Admin Tab, nor in WebHooks Tab of the configuration.

If you need any details on my user defined webhook or any other information, please let me know.

Steps to Reproduce

Expected Behavior

Events counter on /webhooks/history.html page increases after webhook request is sent. Recent WebHook Events table shows events on /webhooks/history.html page and in WebHooks Tab of the needed build configuration after they appear.

Current Behavior

Events counter doesn't change Recent WebHook Events table doesn't show events

Your Environment

TeamCity Enterprise 2020.1.2 (build 78726) tcWebHooks 1.1.372.399

netwolfuk commented 4 years ago

Hi @tanyabrytkova

Thanks for the detailed bug report. I must admit I've not seen this behaviour. I have about 50 webhooks configured, so it's possible some are not being logged correctly, and I haven't noticed.

I'll try with a new TeamCity instance and single webhook on a user defined template to see if I can reproduce this behaviour.

netwolfuk commented 4 years ago

By the way. That history is only stored in memory. It will be lost when TeamCity has restarted.

tanyabrytkova commented 4 years ago

@netwolfuk, thanks a lot for your quick response.

By the way. That history is only stored in memory. It will be lost when TeamCity has restarted.

Yep, I get it, but we didn't restart Teamcity

netwolfuk commented 4 years ago

Hi @tanyabrytkova

I have not been able to reproduce this yet. My logging shows no errors, but the numbers have incremented.

image

And I can see the events on the WebHooks tab of the project.

image

Assuming you have permission to share some config files, can I please ask you to export the template, and the webhook config? If you're happy to attach them to this issue, that would be great, otherwise please email me. My email is my GITHUB username at gmail

Your template will be able to be downloaded at:

http://teamcity/app/rest/webhooks/templates/id:YOUR_TEMPLATE_ID/rawConfig Please replace "YOUR_TEMPLATE_ID" with your template ID (it's in the top left of the edit template page).

Also, your webhook configuration is available from the Diagnostics Tab in the Administration page or on the server on the disk if that's easier.

It will look something like this...

image

The URL in that configuration will contain your Slack token, so please do remove that before sending it to me.

I'll import your template and webhook config, and try to reproduce.

Also, can I ask you to have a look in the log when sending a webhook? Just in case there are any errors occuring. tcWebHooks logs into the teamcity-server.log file.

tanyabrytkova commented 4 years ago

Hi @netwolfuk,

Thanks for the detailed instructions.

Please find attached my custom WebHookTemplate and WebHookConfig. webhook_config_and_template.zip

Unfortunately, the webhook based on the custom template (Alertino) sends requests to our local service, which is not available to you, but I also tried using built-in Slack JSON template (as you can see from the config), and the situation was the same.

Here are some screenshots to prove what I'm saying. admin_tab webhooks_tab

Today this 10005 counter did't change at all. Teamcity is configured to restart once a day (at night), so I expect to see the counter changes and at least something in the logs table.

And I checked logs from the teamcity-server.log file, but there was nothing connected to the webhooks.

netwolfuk commented 4 years ago

Hi @tanyabrytkova

Your TeamCity instance seems quite large. I can see 5 digit build type Ids in the config file. Is there a chance their really was 10005 events between when TC restarted, and then you see this page? The history only stores 10,000 entries. Perhaps it has already done 2,500 builds (with 4 events per build)?

Looking at your files I have a few things to try:

  1. Check thread safety on the history store. Multiple builds could be sending events at the same time.
  2. Check sending many events very rapidly to the history store.
  3. Check how events are generated/written when enabled-for-all="false" and enabled-for-subprojects="false"
  4. Check if events are being put in the wrong type under load. I would have expected skipped events to be the majority.
  5. Validate behaviour of templates with just success and failed build events configured.

I didn't get much time on this on the weekend. Hopefully this week I will get some more time.

tanyabrytkova commented 4 years ago

Hi @netwolfuk,

Thanks a lot. The reason is really this huge number of webhook events that we store (our Teamcity entity is pretty large).

Today I see a bit less events in the counter and some of the events are finally displayed in the table. 1596443341380

I will try to find all the usages and check whether all of them are really needed, and get back to you soon.

BTW, do you have plans to increase the number of the stored events? Or at least display events after they exceeded 10,000?

netwolfuk commented 4 years ago

Thanks for the update @tanyabrytkova

The reason is really this huge number of webhook events that we store (our Teamcity entity is pretty large)

That's exciting! It would be fun to work on such a large installation!

Today I see a bit less events in the counter and some of the events are finally displayed in the table.

This is great! So it's certainly a bug in my code. The list is supposed to contain the most recent 10,000 events, and old ones should be automatically dropped off the end so that the store does not get too big, as that would use up memory in TeamCity. So in your case, the total should read 10,000 events, as you obviously have this many being processed.

I will try to find all the usages and check whether all of them are really needed, and get back to you soon.

ok, in version 1.2, there will be a centralised manager for keeping configuration of all webhooks, and so listing all webhooks is available in the UI. However the easiest way for you is probably a find on the data dir (assuming unix). find ./config/projects/ -name plugin-settings.xml | xargs grep -c '</webhook>'

For me it looks like this...

netwolfuk@ubuntu-teamcity-01:~/.BuildServer$ find ./config/projects/ -name plugin-settings.xml | xargs grep -c '</webhook>'
./config/projects/_Root/pluginData/plugin-settings.xml:14
./config/projects/Utils_MoreUtils/pluginData/plugin-settings.xml:0
./config/projects/Android/pluginData/plugin-settings.xml:1
./config/projects/Spinnaker/pluginData/plugin-settings.xml:1
./config/projects/TestTopProject/pluginData/plugin-settings.xml:13
./config/projects/TestWebHooks/pluginData/plugin-settings.xml:1
./config/projects/TcPlugins/pluginData/plugin-settings.xml:8
./config/projects/UtilsProject/pluginData/plugin-settings.xml:1
./config/projects/TestTopProject_TestMiddleProject/pluginData/plugin-settings.xml:0
./config/projects/TestTopProject_SubProject/pluginData/plugin-settings.xml:16

BTW, do you have plans to increase the number of the stored events?

Yes, I am thinking I will allow users like yourself to set a value in internal.properties. Something like webhook.history.maxEventCount=500000 so that you can set a larger vaule if required.

Also, the ability to not store skipped events, as most of yours will be skipped. A setting like... webhook.history.logSkippedEvents=false

Or at least display events after they exceeded 10,000?

Yes! I have a bug there. It should be displaying the most recent 10,000 events. For some reason, it seems to be only keeping "ok" events, and is not showing them on the logging page correctly once the 10k is reached. I will look into this ASAP.

netwolfuk commented 4 years ago
  1. Check thread safety on the history store. Multiple builds could be sending events at the same time.

Yes. I have found bug here with a simple unit test. Not synchronsing the add method is returning random numbers of items. Eg, in your case you're seeing 10004, rather than 10000.

netwolfuk commented 4 years ago

Hi @tanyabrytkova

Would you see value in maintaining a larger counter but a smaller log?

At the moment, the counters are determined by finding and counting the number of items of each type in the store. It would be more efficient to have four integers and increment these separately. I don't imagine anyone is really going to look through all 10k entries! :-) Or 500k if the value was larger.

netwolfuk commented 4 years ago

I have been able to reproduce this bug, with the 10005 OK items and 10005 total items. The behaviour of LinkedHashMap became unpredictable when multiple threads were trying to write whilst the Map was trying to remove expired entries.

I have a fix which has the following:

I have validated that:

A new release of 1.1 will be coming within the next day or two. I will merge these changes into 1.2 (master) so that they are carried forward.

netwolfuk commented 4 years ago

Yes. It's looking good. I'll do a release shortly.

image

netwolfuk commented 4 years ago

Version 1.1.374.403 released.

tanyabrytkova commented 4 years ago

Hi @netwolfuk , Sorry for the delayed answer and thanks a lot for the fix 👍

Now all works as expected, and thanks to logs, I finally found the solution to my problems!