scalyr / dataset-addon-for-splunk

The DataSet Add-on for Splunk provides integration with
Apache License 2.0
3 stars 10 forks source link

DPDV-4549: Do not crash when invalid query is used #96

Closed martin-majlis-s1 closed 10 months ago

martin-majlis-s1 commented 10 months ago

Jira Link: https://sentinelone.atlassian.net/browse/DPDV-4549 Jira Link: https://sentinelone.atlassian.net/browse/DPDV-4411


🥅 Goal

In the PR https://github.com/scalyr/dataset-addon-for-splunk/pull/15/files#diff-ad6d3503f1efe6044e12096a99eaa00b173327c48ca60dbaebea7630451e8e9eL220 we have switched to LRQ API and in this transition we have lost also error reporting. So lets introduce error reporting back.

🛠️ Solution

Propagate error messages

Old API was returning errors as user friendly messages - {'message': 'Error in filter expression: value expected', 'status': 'error/client/badParam'} - and therefore UI was just showing the message field. The new API is returning less user friendly error messages - {"code":"invalid_argument","message":"field=[filter] error=[value expected]","details":[{"field":"filter","message":"value expected"}]}

Change default to 15 minutes

I am testing this against QA environment. It has some problems, that it almost always fails when I try to search for 4 hours of data. The consequence is that the serverHost field is never populated, and therefore Base Query remains empty.

Screenshot 2023-11-01 at 12 43 09 Screenshot 2023-11-01 at 12 45 34

Which is invalid query => all the examples fails.

Screenshot 2023-11-01 at 12 47 01

It's really bad experience, when you go to the example page, and you see only errors, because it's not able to fetch 4 hours of data. So lets be on the safe side.

🏫 Testing

Search

Screenshot 2023-10-31 at 15 39 48

500 in the middle of long running query

https://github.com/scalyr/dataset-addon-for-splunk/pull/96#issuecomment-1788781557

Screenshot 2023-11-01 at 12 14 30

Timeseries

Time series is still using the old API, but the error reporting part was also missing. Query with serverHost=

Screenshot 2023-11-01 at 10 33 41
martin-majlis-s1 commented 10 months ago

The new API is behaving little bit strange. When I ask for 1 week of data, it very often crashes.

[Example query](http://localhost:8000/en-GB/app/TA_dataset/search?q=%7C%20dataset%20method%3Dpowerquery%20search%3D%22%20%7C%20group%20count2%3Dcount()%20by%20tag%22%20%7C%20spath%20%7C%20table%20tag%20count2&display.page.search.mode=smart&dispatch.sample_ratio=1&workload_pool=&earliest=-7d%40h&latest=now&display.page.search.tab=statistics&display.general.type=statistics&sid=1698837120.180)

2023-11-01 11:02:37,174 DEBUG pid=73847 tid=MainThread file=dataset_api.py:ds_lrq_run_loop:130 | Response(status_code=<HTTPStatus.OK: 200>, content=b'{"id":"eyJ0eXBlIjoiUFEiLCJ0b2tlbiI6ImQyY2M5MGYzLTc4MTEtNDYwYy04ZDJjLTVjMTk3N2I5NWU5ZCJ9","stepsCompleted":14,"stepsTotal":16,"resolvedTimeRange":{"start":1698231600000000000,"end":1698836473000000000},"data":null,"totalSteps":16}', headers=Headers({'server': 'nginx', 'date': 'Wed, 01 Nov 2023 11:02:37 GMT', 'content-type': 'application/json;charset=UTF-8', 'content-length': '229', 'connection': 'keep-alive', 'set-cookie': 'sp=cb95e6d4-ba84-4c23-ad47-c2309b36e19d;Expires=Thu, 31-Oct-2024 16:51:22 GMT;Max-age=31556926;path=/;HttpOnly', 'x-dataset-query-forward-tag': 'TAG-', 'expires': 'Thu, Jan 1 2009 12:00:00 GMT', 'cache-control': 'no-cache, must-revalidate', 'pragma': 'no-cache', 'scalyr-team-token': 'TOKEN--', 'access-control-allow-credentials': 'true', 'access-control-allow-methods': 'GET, POST, PUT, DELETE, OPTIONS', 'access-control-allow-headers': 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Requested-With', 'access-control-max-age': '1728000'}), parsed=QueryResult(id='eyJ0eXBlIjoiUFEiLCJ0b2tlbiI6ImQyY2M5MGYzLTc4MTEtNDYwYy04ZDJjLTVjMTk3N2I5NWU5ZCJ9', steps_completed=14, steps_total=16, resolved_time_range=TimeRangeResultData(start=1698231600000000000, end=1698836473000000000, additional_properties={}), error=<dataset_query_api_client.types.Unset object at 0x7ffffd5f8190>, data=None, additional_properties={'stepsTotal': 16}))
2023-11-01 11:02:37,655 DEBUG pid=73847 tid=MainThread file=dataset_api.py:ds_lrq_run_loop:130 | Response(status_code=<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, content=b'{"code":"internal_server_error","message":"Operation not permitted. Operation not permitted. You do not have access to this account.","details":[]}', headers=Headers({'server': 'nginx', 'date': 'Wed, 01 Nov 2023 11:02:37 GMT', 'content-type': 'application/json;charset=UTF-8', 'content-length': '147', 'connection': 'keep-alive', 'set-cookie': 'sp=3240c1be-f9cf-4d22-b2ff-f8dfdb195354;Expires=Thu, 31-Oct-2024 16:51:23 GMT;Max-age=31556926;path=/;HttpOnly', 'x-dataset-query-forward-tag': 'TAG-', 'expires': 'Thu, Jan 1 2009 12:00:00 GMT', 'cache-control': 'no-cache, must-revalidate', 'pragma': 'no-cache', 'scalyr-team-token': 'TOKEN--', 'access-control-allow-credentials': 'true', 'access-control-allow-methods': 'GET, POST, PUT, DELETE, OPTIONS', 'access-control-allow-headers': 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Requested-With'}), parsed=None)

I can see, that it was able to fetch first 14 chunks and then it fails on Internal Server Error 500 - Operation not permitted. Operation not permitted. You do not have access to this account.

Screenshot 2023-11-01 at 12 14 30