scylladb / argus

Apache License 2.0
4 stars 11 forks source link

`/sct/$id/events/submit` route seems to be broken #330

Closed fruch closed 10 months ago

fruch commented 10 months ago

API for posting event from SCT doesn't work for couple of days now:

< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR > Error committing test events to Argus < t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR > Error committing test events to Argus
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR > Traceback (most recent call last):
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 542, in argus_finalize_test_run
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >     self.test_config.argus_client().submit_events(events_sorted)
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >   File "/usr/local/lib/python3.10/site-packages/argus/client/sct/client.py", line 259, in submit_events
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >     self.check_response(response)
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >   File "/usr/local/lib/python3.10/site-packages/argus/client/base.py", line 53, in check_response
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR >     raise ArgusClientError(
< t:2024-01-28 02:40:40,319 f:tester.py       l:544  c:ScyllaOperatorFunctionalClusterTester p:ERROR > argus.client.base.ArgusClientError: ('Unexpected HTTP Response encountered - expected: 200, got: 500', 200, 500, <PreparedRequest [POST]>)
fruch commented 10 months ago

looks like the root disk is full, so can see what is the failure in the logs:

~$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root       16197480 16181096         0 100% /
fruch commented 10 months ago

I've cleared the /var/mail/argus it was 8.1Gb of cron tab mails that no-one is reading...

k0machi commented 10 months ago

I've cleared the /var/mail/argus it was 8.1Gb of cron tab mails that no-one is reading...

I should probably disable that, it's mail from jenkins scans...

k0machi commented 10 months ago

Disabled mail for argus user.

fruch commented 10 months ago

@k0machi do we have logs of it anywhere else ?

k0machi commented 10 months ago

@k0machi do we have logs of it anywhere else ?

They're fairly verbose for what they are, I should probably take a look to make them less verbose and output them into system journal. FWIW the scanner rarely if ever fails

fruch commented 10 months ago

@k0machi do we have logs of it anywhere else ?

They're fairly verbose for what they are, I should probably take a look to make them less verbose and output them into system journal. FWIW the scanner rarely if ever fails

never say never :)