guilhemmarchand / TA-jira-service-desk-simple-addon

Atlasian JIRA add-on for Splunk alert actions
11 stars 8 forks source link

Timeout when alerting massive issues #110

Closed stuart-alt closed 3 years ago

stuart-alt commented 3 years ago

Hello, I am seeking for help regarding an issue I've been facing. When a search returns, let's say, 100 results, Jira is only creating something around 40 issues. The logs in Jira indicates that the other 60 requests never reached Jira. In Splunk logs I can see that it's failing very close to 5 minutes after it started. I'm not completely sure if it's the timeout breaking the connection, but testing from other system I didin't face any issue. Any sugestion would be appreciate.

Below is the log when issue fails: _time user action_status jira_issue signature 2021-08-24T09:44:34.886+0000 REMOVED@.com failure Backlog KVstore saving has failed!. url=https://localhost:8089/servicesNS/nobody/TA-jira-service-desk-simple-addon/storage/collections/data/kv_jira_issues_backlog, data={ 2021-08-24T09:44:34.852+0000 REMOVED@.com success SCM-861 JIRA Service Desk ticket successfully created. https://somename.atlassian.net/rest/api/2/issue, content={ 2021-08-24T09:44:33.676+0000 REMOVED@.com success jira_dedup: An issue with same md5 hash (65324c28104260ed63cdd03fa5f96a69) was found in the backlog collection, as jira_dedup is not enabled a new issue will be created, entry:={

Thank you!

guilhemmarchand commented 3 years ago

Hi @stuart-alt !

Right, this looks suspiscious, can you confirm:

The KVstore saving failure seems to be indicating issues while talking to the local KVstore. In short, when the TA creates an issue in JIRA, there is an interactions with the Splunk local KVstore too for different purposes.

Would it be possible that you share with me a deeper extract of the TA log when this happens? By email: guilhem.marchand@gmail.com The log file is "jira_service_desk_modalert.log"

I will run into some tests to see if I can reproduce this.

guilhemmarchand commented 3 years ago

@stuart-alt

I could not replicate your issues with mass testing, in lab all of our my mass testing were successfully leading to the creating of an issue in JIRA.

I believe your Splunk search head instance is struggling to keep up with the traffic. As I was saying earlier, the Add-on performs operations locally with the Splunk KVstore, what I think seems to be happening is that your Splunk search head is not performing properly and cannot keep up with the rest calls, which would indicate scaling issues on your side perhaps due to overloaded instance, have you been checking how resources look like?

guilhemmarchand commented 3 years ago

Hi @stuart-alt

Any one news on this?

stuart-alt commented 3 years ago

Hello @guilhemmarchand , Our Ta is 1.0.25 and we run Splunk in the Cloud. As I am not the Splunk admin, I was not able to retrieve the logs you've asked. Since I can send you the logs, I believe you can close this issue. Thank you a lot!

guilhemmarchand commented 3 years ago

Thanks @stuart-alt, closing it then. Anything new feel free to re-open it!