scylladb / argus

Apache License 2.0
4 stars 11 forks source link

Jobs are failing to render on "My Jobs" page #512

Open fruch opened 1 day ago

fruch commented 1 day ago

some jobs in "My Jobs" page can't be shown

Image

console:

ProfileJob.svelte:36 TypeError: t.arguments is not iterable
    at ProfileJob.svelte:23:58
    at async ProfileJob.svelte:41:9

JobId: 2ece3cba-a326-4877-85b1-9cc083a8c83d JobId: 22b73bce-9415-4c33-b5a4-21ec30a0d19e

soyacz commented 1 day ago

looking at the job with https://argus.scylladb.com/tests/scylla-cluster-tests/2ece3cba-a326-4877-85b1-9cc083a8c83d causes internal server error

fruch commented 1 day ago
b99b-4c8e-ae73-2d221b568112/junit/get_all => generated 30 bytes in 7 msecs (HTTP/1.1 200) 3 headers in 85 bytes (1 switches on core 0)
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: [ERROR] <5.29.124.170 - https://argus.scylladb.com/api/v1/test-info?testId=22b73bce-9415-4c33-b5a4-21ec30a0d19e - api.test_info> - error_handlers::handle_api_exception - Exception in api.test_info
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: Traceback (most recent call last):
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/cassandra/cqlengine/query.py", line 775, in get
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     obj = self[0]
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:           ~~~~^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/cassandra/cqlengine/query.py", line 573, in __getitem__
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     return self._result_cache[s]
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ~~~~~~~~~~~~~~~~~~^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: IndexError: list index out of range
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: During handling of the above exception, another exception occurred:
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: Traceback (most recent call last):
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     rv = self.dispatch_request()
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:          ^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/app/argus/backend/service/user.py", line 261, in wrapped_view
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     return view(*args, **kwargs)
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/app/argus/backend/controller/api.py", line 385, in test_info
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     info = service.get_test_info(test_id=UUID(test_id))
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/app/argus/backend/service/argus_service.py", line 156, in get_test_info
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     test = ArgusTest.get(id=test_id)
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/cassandra/cqlengine/models.py", line 700, in get
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     return cls.objects.get(*args, **kwargs)
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/cassandra/cqlengine/query.py", line 763, in get
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     return self.filter(*args, **kwargs).get()
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:   File "/home/argus/.cache/pypoetry/virtualenvs/argus-alm-HqUqk8xE-py3.12/lib/python3.12/site-packages/cassandra/cqlengine/query.py", line 777, in get
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]:     raise self.model.DoesNotExist
Nov 20 10:26:38 ip-10-0-2-112 bash[875345]: cassandra.cqlengine.models.DoesNotExist
fruch commented 1 day ago

first sign of trouble, is at 15:05:

Nov 19 15:05:33 ip-10-0-2-112 bash[847502]: [ERROR] <87.71.200.23 - https://argus.scylladb.com/api/v1/test/22b73bce-9415-4c33-b5a4-21ec30a0d19e/runs?limit=10 - api.testrun_api.get_ru
ns_for_test> - error_handlers::handle_api_exception - Exception in api.testrun_api.get_runs_for_test
Nov 19 15:05:33 ip-10-0-2-112 bash[847502]: [pid: 847502|app: 0|req: 4215/283616] 87.71.200.23 () {58 vars in 3584 bytes} [Tue Nov 19 15:05:32 2024] GET /api/v1/test/22b73bce-9415-4c33-b5a4-21ec30a0d19e/runs?limit=10 => generated 74 bytes in 28 msecs (HTTP/1.1 200) 3 headers in 85 bytes (1 switches on core 1)
Nov 19 15:06:00 ip-10-0-2-112 bash[847501]: [ERROR] <93.173.89.28 - https://argus.scylladb.com/api/v1/test/22b73bce-9415-4c33-b5a4-21ec30a0d19e/runs?additionalRuns[]=fb7d3111-9329-4e0e-a237-715b6874a7ef&additionalRuns[]=f28b76ff-626d-43a5-9250-41452e831d06&limit=10 - api.testrun_api.get_runs_for_test> - error_handlers::handle_api_exception - Exception in api.testrun_api.get_runs_for_test
k0machi commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

fruch commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

k0machi commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

k0machi commented 1 day ago

looking at the job with https://argus.scylladb.com/tests/scylla-cluster-tests/2ece3cba-a326-4877-85b1-9cc083a8c83d causes internal server error

The issue here is that there's a typo on the error message - it's a testId, not jobId. Need to add a 404 page instead of just crashing with 500 if you try a broken link.

k0machi commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you.

Image

fruch commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you.

Image

A test was removed ? from Argus, from Jenkins ?

It's from the admin panel ?

k0machi commented 1 day ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you. Image

A test was removed ? from Argus, from Jenkins ?

It's from the admin panel ?

Most likely someone did so from Admin panel, yes. There's an option there to keep the runs but remove the test, which is what probably happened.

fruch commented 23 hours ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you. Image

A test was removed ? from Argus, from Jenkins ?

It's from the admin panel ?

Most likely someone did so from Admin panel, yes. There's an option there to keep the runs but remove the test, which is what probably happened.

Why would we need such an option ?

k0machi commented 23 hours ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you. Image

A test was removed ? from Argus, from Jenkins ? It's from the admin panel ?

Most likely someone did so from Admin panel, yes. There's an option there to keep the runs but remove the test, which is what probably happened.

Why would we need such an option ?

For archival purposes I suppose, we can switch it to default delete, so that only if you really want to keep old runs (or maybe you want to move them, but that's a different feature we should implement)

fruch commented 23 hours ago

Those jobs seem to refer to a non-existent test. I'll send out a pr to handle this more gracefully, but the problem is that you were assigned to a run that doesn't have a test_id or has a stale/old test_id.

how that can happen ?

In the first case - the run was submitted before a test was created for it (but it should make it impossible for you to be assigned to it). In the second case, somebody deleted the test after the run was submitted. I'll see if it's something else.

This test has been removed. I'll add additional handling to indicate that this happened to the user. It seems that whoever removed the test didn't remove the runs and they were assigned to you. Image

A test was removed ? from Argus, from Jenkins ? It's from the admin panel ?

Most likely someone did so from Admin panel, yes. There's an option there to keep the runs but remove the test, which is what probably happened.

Why would we need such an option ?

For archival purposes I suppose, we can switch it to default delete, so that only if you really want to keep old runs (or maybe you want to move them, but that's a different feature we should implement)

What's the point of saving old runs, if you can't get to them on Argus UI ? sounds kind of pointless to me

k0machi commented 23 hours ago

What's the point of saving old runs, if you can't get to them on Argus UI ? sounds kind of pointless to me

In the rare cases someone remembers that they were important, I guess. I added that button back when we weren't sure how long we should retain argus runs and if we ever should delete them.