kernelci / kernelci-core

Core KernelCI tools
https://kernelci.org
GNU Lesser General Public License v2.1
104 stars 97 forks source link

[BUG?] Duplicated node events #2525

Closed r-c-n closed 4 weeks ago

r-c-n commented 5 months ago

When listening to node events in staging, the listener sometimes receives duplicated events. I can't find a pattern for this and it looks random.

The code that does the listening looks like this (https://github.com/kernelci/kernelci-pipeline/blob/833c4df3a45bdb79f7d45ad0e891c9af6e423026/src/result_summary/monitor.py#L102):

def run(service, context):
    while True:
        node = service._api_helper.receive_event_node(context['sub_id'])
        service.log.debug(f"Node event received: {node}")
        ...

Here's the subscription code (https://github.com/kernelci/kernelci-pipeline/blob/833c4df3a45bdb79f7d45ad0e891c9af6e423026/src/result_summary/monitor.py#L16):

def setup(service, args, context):
    base_filter = context['preset_params'][0]
    sub_id = service._api_helper.subscribe_filters({
        'kind': base_filter['kind'],
        'state': base_filter['state'],
    })
    if not sub_id:
        raise Exception("Error subscribing to event")
    return {'sub_id': sub_id}

This is an example of a duplicated event:

04/29/2024 10:45:13 AM UTC [DEBUG] Node event received: {
 'id': '662f7123ef2d5debad00f52c',
 'kind': 'test',
 'name': 'tast-kernel-x86-intel',
 'path': [
     'checkout',
     'kbuild-gcc-10-x86-chromeos-intel',
     'tast-kernel-x86-intel'
 ],
 'group': 'tast-kernel-x86-intel',
 'parent': '662f6d55ef2d5debad00f127',
 'state': 'done',
 'result': 'fail',
 'artifacts': {
     'lava_log': 'https://kciapistagingstorage1.file.core.windows.net/staging/tast-kernel-x86-intel-662f7123ef2d5debad00f52c/log.txt.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D'
 },
 'data': {
     'error_code': 'Test',
     'error_msg': 'Tast tests run failed',
     'test_source': None,
     'test_revision': None,
     'platform': 'hp-x360-12b-ca0010nr-n4020-octopus',
     'runtime': 'lava-collabora',
     'job_id': '13567400',
     'job_context': None,
     'regression': '662f7a38ef2d5debad010e93',
     'kernel_revision': {
         'tree': 'kernelci',
         'url': 'https://github.com/kernelci/linux.git',
         'branch': 'staging-stable',
         'commit': '4824aec7adb883a7253d61073e2fd91d7e78c0fa',
         'describe': 'staging-stable-20240429.2',
         'version': {
             'version': 6,
             'patchlevel': 1,
             'sublevel': 88,
             'extra': '-1-g4824aec7adb88',
             'name': None
         },
         'patchset': None},
     'arch': 'x86_64',
     'defconfig': 'cros://chromeos-6.1/x86_64/chromeos-intel-pineview.flavour.config',
     'config_full': 'cros://chromeos-6.1/x86_64/chromeos-intel-pineview.flavour.config+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y',
     'compiler': 'gcc-10',
     'kernel_type': 'bzimage'
 }, 'debug': None,
 'jobfilter': None,
 'created': '2024-04-29T10:06:27.326000',
 'updated': '2024-04-29T10:45:12.266000',
 'timeout': '2024-04-29T16:06:27.326000',
 'holdoff': None,
 'owner': 'staging.kernelci.org',
 'submitter': '9790c38bd29c1375fc96ed92b9b4bfab',
 'user_groups': []
}
04/29/2024 10:45:17 AM UTC [DEBUG] Node event received: {
 'id': '662f7123ef2d5debad00f52c',
 'kind': 'test',
 'name': 'tast-kernel-x86-intel',
 'path': [
     'checkout',
     'kbuild-gcc-10-x86-chromeos-intel',
     'tast-kernel-x86-intel'
 ],
 'group': 'tast-kernel-x86-intel',
 'parent': '662f6d55ef2d5debad00f127',
 'state': 'done',
 'result': 'fail',
 'artifacts': {
     'lava_log': 'https://kciapistagingstorage1.file.core.windows.net/staging/tast-kernel-x86-intel-662f7123ef2d5debad00f52c/log.txt.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D'
 },
 'data': {
     'error_code': 'Test',
     'error_msg': 'Tast tests run failed',
     'test_source': None,
     'test_revision': None,
     'platform': 'hp-x360-12b-ca0010nr-n4020-octopus',
     'runtime': 'lava-collabora',
     'job_id': '13567400',
     'job_context': None,
     'regression': '662f7a38ef2d5debad010e93',
     'kernel_revision': {
         'tree': 'kernelci',
         'url': 'https://github.com/kernelci/linux.git',
         'branch': 'staging-stable',
         'commit': '4824aec7adb883a7253d61073e2fd91d7e78c0fa',
         'describe': 'staging-stable-20240429.2',
         'version': {
             'version': 6,
             'patchlevel': 1,
             'sublevel': 88,
             'extra': '-1-g4824aec7adb88',
             'name': None
         },
         'patchset': None},
     'arch': 'x86_64',
     'defconfig': 'cros://chromeos-6.1/x86_64/chromeos-intel-pineview.flavour.config',
     'config_full': 'cros://chromeos-6.1/x86_64/chromeos-intel-pineview.flavour.config+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y',
     'compiler': 'gcc-10',
     'kernel_type': 'bzimage'
 }, 'debug': None,
 'jobfilter': None,
 'created': '2024-04-29T10:06:27.326000',
 'updated': '2024-04-29T10:45:12.266000',
 'timeout': '2024-04-29T16:06:27.326000',
 'holdoff': None,
 'owner': 'staging.kernelci.org',
 'submitter': '9790c38bd29c1375fc96ed92b9b4bfab',
 'user_groups': []
}

The first event seems to be received right after the node is updated, the second one is received four seconds later. In both cases the node contents are identical, including the timestamps.

If you need assistance on my side debugging this, let me know how I can help you.

JenySadadia commented 5 months ago

Hello, Thanks for reporting the issue. I also observed pipeline/monitor printing an event data twice on staging.

29/04/2024 17:51:54 2024-04-29 12:21:54.253457  test            eb216a00c341  662f90e2a179fe14867fb86d Running    None      baseline-arm
29/04/2024 17:51:55 2024-04-29 12:21:55.277708  test            eb216a00c341  662f90e2a179fe14867fb86d Running    None      baseline-arm

At first glance, It looks like more of a monitor issue than the API events as if API is generating the same event multiple times, other services should also receive them twice. But it doesn't seem like the case. Need to investigate more though.

r-c-n commented 5 months ago

FWIW, I also noted that kind of duplicated entries in the monitor since the beginning, but I always got the impression that that was something like a Dozzle artifact (maybe not). I think the duplicated events I posted above only appeared once in the monitor, but I'll double check the next time I catch one of these.

JenySadadia commented 4 months ago

Hello @hardboprobot This issue is not being observed on staging anymore. Please let me know if you observe it again.

r-c-n commented 4 months ago

Ok, I'll keep checking during the week and I'll close the issue if I don't see anything strange.

nuclearcat commented 3 months ago

I guess issue can be closed?

r-c-n commented 3 months ago

I haven't seen anything weird since, although I haven't been looking closely either. We can close this, if I bump into this again we can reopen it.