tcplugins / tcWebHooks

WebHooks plugin for Teamcity. Supports many build states and payload formats.
https://netwolfuk.wordpress.com/category/teamcity/tcplugins/tcwebhooks/
157 stars 30 forks source link

TeamCity server cpu and memory spike after enabling tcwebhooks #212

Closed dom747 closed 5 months ago

dom747 commented 1 year ago

Expected Behavior

We had the tcwebhooks enabled for a long time and there are 79 webhooks configured. It was working.

Current Behavior

Last week, after changing out agent ami and applying the cloud profiles, the agents were not able to upgrade due to plugins out of date. The cpu got to 100% usage and stayed there. Agents were not able to start builds. We contacted jetbrains support with a support ticket and their response was that the tc webhooks plugi was causing the issue. We disabled it and the server got back to normal. Then I upgraded the plugin to the latest version (1.2.1) and enabled it again, and restarted the server. Again the memory and cpu spiked way up. I've had to disable it again now.

This was the response from Jetbrains:

The main cause of the issue, it seems, is related to the https://plugins.jetbrains.com/plugin/8948-web-hooks-tcwebhooks- plugin - the executors it spawns seem to be CPU-intensive and are running for prolonged periods of time:

28m:07s Task: 'webhook.teamcity.executor.BuildEventWebHookRunner@78777d59' at webhook.teamcity.payload.content.ExtraParameters.getActual(ExtraParameters.java:130) at webhook.teamcity.payload.content.ExtraParameters.put(ExtraParameters.java:79) at webhook.teamcity.payload.content.ExtraParameters.putAll(ExtraParameters.java:98) at webhook.teamcity.WebHookContentBuilder.mergeParameters(WebHookContentBuilder.java:431) at webhook.teamcity.WebHookContentBuilder.buildWebHookContent(WebHookContentBuilder.java:185) at webhook.teamcity.executor.BuildEventWebHookRunner.getWebHookContent(BuildEventWebHookRunner.java:53) at webhook.teamcity.executor.AbstractWebHookExecutor.run(AbstractWebHookExecutor.java:65) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Can you please check the version of plugin in use and try to upgrade it if a newer release is available? If that does not help, can you please try to disable the plugin and let me know if it helps with the server performance?"

The support ticket with jetbrains: Request #4905074 The logs are there. I can provide the logs here if there is a way to provide them..

Your Environment

Example Configuration (xml)

Can you let me know where the xml file is ? Unfortunately, I have not configured any of these and I don't have experience with the plugin. I might be able to get more info on it.

arthursmel commented 1 year ago

Since the plugin was disabled, we no longer can access the webhook configurations for each project. We were using the following template:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<webhook-templates>
    <webhook-template id="tc-pr-verify-webhook" enabled="true" rank="100" format="jsonTemplate">
        <template-description>PR Verify Webhook</template-description>
        <template-tool-tip></template-tool-tip>
        <preferred-date-format></preferred-date-format>
        <templates max-id="0">
            <template id="0">
                <template-text use-for-branch-template="true">{
    'event_key': '${notifyType}',
    'build_id': '${buildId}',
    'build_tags': '${buildTags}',
    'build_text': '${text}',
    'build_url': '${buildStatusUrl}'
}</template-text>
                <branch-template-text></branch-template-text>
                <states>
                    <state type="buildInterrupted" enabled="true"/>
                    <state type="buildSuccessful" enabled="true"/>
                    <state type="buildFailed" enabled="true"/>
                    <state type="buildFixed" enabled="true"/>
                    <state type="buildBroken" enabled="true"/>
                </states>
            </template>
        </templates>
    </webhook-template>
</webhook-templates>
netwolfuk commented 1 year ago

Hi @dom747 . Thank you for the detailed bug report.

Have you previously been running any of the tcWebHooks pre-release 1.2.0 versions (eg, Alpha, or Release Candidate), or were you previously running 1.1.x ?

I am trying to determine if the change is because of the AMIs, or because of a recent tcWebHooks upgrade. The ExtraParameters logic has changed in 1.2.0 and above, but has been in alpha a year at least. If you happened to try any of those versions that would help me pinpoint the issue.

If you're running Centos, have you changed any of the SELinux configurations? By default SELinux is enabled. I'm not sure that will matter in this case. Just interesting to know.

I can't seem to find ticket 4905074 on the jertbrains youtrack instance. I will email their support team and ask to get access to your support ticket. Is that ok?

The webhook XML file is located on the server in BuildServer/config/projects/yourProject/pluginData/plugin-settings.xml

netwolfuk commented 1 year ago

Executing webhooks in threads was added in 1.2.0 also.

You could try disabling threading in tcWebHooks by creating a <webhooks> section in your BuildServer/config/main-config.xml

It looks like this...

<?xml version="1.0" encoding="UTF-8"?>
<server>
  <webhooks useThreadedExecutor="false">
  </webhooks>
</server>
dom747 commented 1 year ago

Thanks for the reply. We have been using 1.2.0 for a while, I believe. We have not changed any SELinux configurations. Yes it's ok for you to get information from them. Thanks. I could try your suggestion. The challenge is that it took 2 hours to get the server back to a working state after my last attempt to enable the plugin. We have 1000 AWS agents and they all need an update, and the server got stuck until I restarted 3 times. My jetbrains ticket was a support ticket , not on their YouTrack.

netwolfuk commented 1 year ago

Wow, that's awesome!. I have not tested it on 1000 agents. At that scale, it could be a concurrency issue with the ExtraParameters object. I am trying to figure out how this could be an issue based on the number of agents, but I suspect it's that my code is consuming all the threads on the ThreadPool Executor that TeamCity allows plugins to access.

If it's the same thread pool, maybe my code is starving it for all the other threads, including the threads that communicate with the agents.

I feel like the root cause could be something with the change to the lastest centos. I will create some VMs and try to replicate the issue. My personal budget won't scale to 1000 agents though :-).

Are you using a public AMI? If so, I could use the same ones as you for testing.

dom747 commented 1 year ago

We have our own ami built using packer

netwolfuk commented 1 year ago

I've reached out to a couple of mates at TeamCity, but don't have a contact for "support". Could you please email me (my same username as here but at gmail) with a TeamCity support email address?

Also, if you can share you packer script (remove anything that's private), or at least the base AMI that packer is using. Perhaps via email too, rather than on this public forum. Thanks!

netwolfuk commented 1 year ago

Release 2.0.0. Release Candidate 1 released to try to resolve the issues associated with this issue

netwolfuk commented 5 months ago

I'm going to close this out. The bugfix has been available for over a year, and the 2.0.0 is released for all users. I've had no further issues reported with the lastet code.