kernelci / kcidb

kernelci.org common database tools
GNU General Public License v2.0
27 stars 34 forks source link

Create an initial stable-rc report email in KCIDB #532

Closed padovan closed 1 week ago

padovan commented 2 months ago

We need to create a report for our ongoing work for of reporting stable-rc results with Greg KH.

The report will be sent to ourselves first. @crazoes is the main recipient.

Info given by @spbnick:

Here's a sample of notifications we send to Mark. It's sent an hour after last update to the revision.

Here's the code of his subscription: https://github.com/kernelci/kcidb/blob/main/kcidb/monitor/subscriptions/mark_brown.py

Here's the template entry point for notification body for revisions: https://github.com/kernelci/kcidb/blob/main/kcidb/templates/revision_description.txt.j2 (edited)

JenySadadia commented 2 months ago

Hello @crazoes

Here is the sample report generated for stable-rc from recent submissions to KCIDB:

Subject: stable-rc report for linux-stable-rc.git:linux-6.6.y@v5.16-rc2-19-g383a44aec91c

Below is the summary of results Kernel CI database has recorded
for this revision so far. See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=

OVERVIEW

        Builds: ❌ FAIL
         Tests: ✅ PASS

REVISION

    Status
        ✅ PASS
    Commit
        name: v5.16-rc2-19-g383a44aec91c
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
    By
        broonie, maestro

BUILDS

    Status
             ❌ 1  ✅ 1
    Architectures
        arm  ❌ 1  ✅ 1
    Failures
        ❌ 1  arm multi_v7_defconfig
    By
        broonie, maestro

TESTS

    Status
        ✅ 1
    By
        maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=

LEGEND

    ❌ FAIL     - Failed. Tested code is likely faulty.
    💥 ERROR    - Aborted. Test, tested code, or both might be faulty.
    🟩 MISS     - Missing. Planned, but failed to execute.
    ✅ PASS     - Passed. Tested code is likely correct.
    🆗 DONE     - Finished. Status of tested code is unknown.
    ⏩ SKIP     - Skipped. Planned, but didn't apply.
    ❓ UNKNOWN  - In progress, or status unknown.

    🚧 WAIVED   - Waived result. Test is too new or shows known failures.

    ➖ BLANK    - No data, zero.

The format I have used is from an existing subscription. Please let me know the preferred report format and I'll update the templates accordingly.

crazoes commented 2 months ago

@JenySadadia

Subject is incorrect - it says the report is for 6.6 kernel but the revision is 5.16-rc2-19-g383a44aec91c

In the Revision, Build and Tests section - remove status and By sections as they don't give any additional information.

though, we can have status section under tests but it should be placed under the test name which we want to add here. For example, one of the tests that we will have here is a Boot test and under that we should give a summary with the status tag

JenySadadia commented 2 months ago

Hello,

Below is the updated report with the modifications as per review discussion:

Subject: KernelCI report for linux-stable-rc:linux-6.6.y@v5.16-rc2-19-g383a44aec91c

OVERVIEW

        Builds: ❌ FAIL
         Tests: ❌ FAIL

REVISION

    Commit
        name: v5.16-rc2-19-g383a44aec91c
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
    Tested-by
        broonie, maestro

BUILDS

    Summary
        ❌ 2  ✅ 1

    Failures
       -arm (multi_v7_defconfig)
       -intel (multi_v7_defconfig)
       Tested-by: maestro

TESTS

    Summary
        ❌ 3

    Failures
      -kselftest.dt.dt_test_unprobed_devices_sh
      Tested-by: broonie

      -kver
      -kselftest.cpufreq
      -kunit.exec.total_mapping_size_test
      Tested-by: maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=

Here are the addressed review modifications:

Note: I am working on describing failure/error info. We'll discuss Summary part once everything else is done.

Please let me know your thoughts @crazoes. Please note that I have generated report on dummy data so please do not verify/validate :sweat_smile:

padovan commented 1 month ago

Tested-by: is a confusing, as that is also a kernel commit tag we use a lot. Test systems, Test data sources, etc?

JenySadadia commented 1 month ago

Tested-by: is a confusing, as that is also a kernel commit tag we use a lot. Test systems, Test data sources, etc?

I think Greg suggested @crazoes to use Tested-by tag.

JenySadadia commented 1 month ago

Revised report with build errors:

Subject: KernelCI report for linux-stable-rc:linux-6.6.y@v5.16-rc2-19-g383a44aec91c

OVERVIEW

        Builds: ❌ FAIL
         Tests: ❌ FAIL

REVISION

    Commit
        name: v5.16-rc2-19-g383a44aec91c
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
    Tested-by
        broonie, maestro

BUILDS

    Summary
        ❌ 2  ✅ 1

    Failures
       -arm (multi_v7_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -intel (multi_v7_defconfig)
       Build error: make: *** [Makefile:1996: modules] Error 1
       Tested-by: maestro

TESTS

    Summary
        ❌ 3

    Failures
      -kselftest.dt.dt_test_unprobed_devices_sh
      Tested-by: broonie

      -kver
      -kselftest.cpufreq
      -kunit.exec.total_mapping_size_test
      Tested-by: maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=
crazoes commented 1 month ago

Hi @JenySadadia

This report looks much better than previous one.

Some more changes which I think we need to make :-

Subject is still incorrect, not sure why we are getting the branch name as 6.6 if we are looking at results of 5.16. So we need to fix that before everything else.

I think we can also make the subject better. I'm thinking of something like this :- KernelCI report for stable-rc: linux-5.16.y@v5.16-rc2-19-g383a44aec91c

In the overview section, let's also mention the number of build failures and the number of test failures otherwise it gives a little wrong impression of everything failing.

In my opinion, we should remove Tested-by tag. @JenySadadia there seems to be a little confusion about the tested by tag here. What we want is to have a tag Tested-by: kernelci.org bot <bot@kernelci.org> at the end of the report. This is what Greg wanted from us and we shouldn't really mention maestro, broonie etc. @padovan what are your thoughts about this?

I'm expecting revision to look like this :-

REVISION

    Commit
        name: v5.16-rc2-19-g383a44aec91c
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y

Build section looks really good to me but some more changes that I'd like to see are :-

   Failures
       -arm (multi_v7_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -intel (multi_v7_defconfig)
       Build error: make: *** [Makefile:1996: modules] Error 1

FYI, there isn't any architecture named intel so something is wrong here. It might be x86_64 so please verify that too.

Test section is really confusing and it's hard to read it. Let's remove the differentiation of the results from different CI systems. So we shouldn't keep the Tested-by tag in the Test section as well. All the results should be bundled together, no matter from which CI systems it comes from.

@JenySadadia for stable-rc we just want to send the boot test results which is baseline.login tests. So under the Test section we want something like this :-

TESTS

    Failures
      baseline.login (boot test)
      - Device1 (config_file_name)
      - Device2 (config_file_name)
      ....

Again, let's remove the summary from it as it doesn't really provide any info

At the end of everything, we want to have the following :-

 Tested-by: kernelci.org bot <bot@kernelci.org>

 Thanks,
 KernelCI Team
padovan commented 1 month ago

yes, it is exactly what @crazoes said about the Tested-by tag.

JenySadadia commented 1 month ago

Hello @crazoes

Thanks a lot for all the suggestions.

Subject is still incorrect, not sure why we are getting the branch name as 6.6 if we are looking at results of 5.16. So we need to fix that before everything else.

As I stated earlier, the report I generated was on dummy data that's why it was showing incorrect data. I fixed it to have correct output.

I think we can also make the subject better. I'm thinking of something like this :- KernelCI report for stable-rc: linux-5.16.y@v5.16-rc2-19-g383a44aec91c

Okay. Addressed this.

In the overview section, let's also mention the number of build failures and the number of test failures otherwise it gives a little wrong impression of everything failing.

Makes sense. Updated to have number of passed and failed builds/tests. Also, dropped summary sections for BUILDS and TESTS as it's already been covered in Overview section.

In my opinion, we should remove Tested-by tag. @JenySadadia there seems to be a little confusion about the tested by tag here. What we want is to have a tag Tested-by: kernelci.org bot bot@kernelci.org at the end of the report.

Oh, sorry for the confusion. I fixed the Tested-by tag.

@JenySadadia for stable-rc we just want to send the boot test results which is baseline.login tests.

Why are we only interested in boot tests? Bdw I have updated the section to have arch(configs). It's not possible atm to list down device names as KCIDB doesn't have a dedicated field for it. Maestro uses misc.platform but that's not generic to all CI systems. On a separate note, the discussion to add platform to KCIDB schema has already been started.

Let's remove the differentiation of the results from different CI systems. So we shouldn't keep the Tested-by tag in the Test section as well. All the results should be bundled together, no matter from which CI systems it comes from.

KCIDB generates an aggregate report from different CI systems' results. I thought it's worth mentioning which CI system reported which failure. Hence, I kept By sections for build and test failures. But again it depends on how the report is going to be used. What are your thoughts on that @padovan?

Here is the updated report:

Subject: KernelCI report for stable-rc: linux-6.6.y@v6.6.35-193-g580e509ea1348

OVERVIEW

        Builds: 1 passed, 3 failed

         Tests: 0 passed, 4 failed

REVISION

    Commit
        name: v6.6.35-193-g580e509ea1348
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
    By
        broonie, maestro

BUILDS

    Failures
       -arm (multi_v7_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -arm (vexpress_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -i386 (allnoconfig)
       Build error: make: *** [Makefile:1996: modules] Error 1
       By: maestro

TESTS

    Boot failures
      arm:(omap2plus_defconfig)
      By: broonie

      arm:(multi_v7_defconfig, vexpress_defconfig)
      i386:(allnoconfig)
      By: maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=

Tested-by: kernelci.org bot <bot@kernelci.org>

Thanks,
KernelCI team
padovan commented 1 month ago

By: broonie

By seems a bit unclear. Maybe Test system?

On Build failures, may we need a black line between each failure?

JenySadadia commented 1 month ago

By: broonie

By seems a bit unclear. Maybe Test system?

We can use Reported-by: or maybe CI: ?

On Build failures, may we need a black line between each failure?

Yes, I'll add it.

JenySadadia commented 1 month ago

Oh, yes, I remember, I kept a blank line after a group of CI results. For example,

BUILDS

    Failures
       -arm (omap2plus_defconfig)
       By: broonie

       -arm (multi_v7_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame si=
ze of 1040 bytes is larger than 1024 bytes [-Werror=3Dframe-larger-than=3D]
       -arm (vexpress_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame si=
ze of 1040 bytes is larger than 1024 bytes [-Werror=3Dframe-larger-than=3D]
       -i386 (allnoconfig)
       Build error: make: *** [Makefile:1996: modules] Error 1
       By: maestro
spbnick commented 1 month ago

A side note: we need to credit origins when we show and send results, so let's not drop them, but work on improving representation instead. We should not take credit over the resources supplied by other CI systems, but highlight them instead.

Jeny, as you work on those changes, could you create separate templates for stable-rc instead of changing the stock ones? And use and expand the template macro libraries, as necessary? This way we could get this merged sooner, and then we can work on integrating what works in the main reports.

spbnick commented 1 month ago

And one more thing: we should avoid putting too much information into the (plain text or even HTML) emails. It quickly becomes unusable as the only controls you have to review the data are scrolling back and forth and string search.

I think the emails should act as an alert system, first of all, presenting the most important data succinctly, and directing people towards dashboards, which should have better tools for exploring and representing the information. It should usually also have more data by the time the email is viewed, as we don't really send them when we're "done", because we're potentially never done testing.

As a perspective, KernelCI legacy reports were already unreadable due to the amount of data in them. Now we're adding data from other CI systems, and so that approach wouldn't work at all.

JenySadadia commented 1 month ago

Jeny, as you work on those changes, could you create separate templates for stable-rc instead of changing the stock ones? And use and expand the template macro libraries, as necessary? This way we could get this merged sooner, and then we can work on integrating what works in the main reports.

Yes, I have already created a bunch of stable_rc_*.j2 templates for this. See https://github.com/kernelci/kcidb/pull/537.

JenySadadia commented 1 month ago

Hello,

Here is an updated report following our discussion yesterday:

Subject: KernelCI report for stable-rc: linux-6.6.y@v6.6.35-193-g580e509ea1348

OVERVIEW

        Builds: 0 passed, 4 failed

         Tests: 0 passed, 5 failed

           CIs: broonie, maestro

REVISION

    Commit
        name: v6.6.35-193-g580e509ea1348
        hash: 580e509ea1348fc97897cf4052be03c248be6ab6
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y

BUILDS

    Failures
       -arm (omap2plus_defconfig)
       CI: broonie

       -arm (multi_v7_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -arm (vexpress_defconfig)
       Build error: display/dc/link/link_factory.c:743:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
       -x86 (allnoconfig)
       Build error: make: *** [Makefile:1996: modules] Error 1
       CI: maestro

BOOT TESTS

    Failures
      arm:(vexpress_defconfig)
      -bcm2711-rpi-4-b
      x86:(allnoconfig)
      -acer-cb317-1h-c3z6-dedede
      CI: maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=580e509ea1348fc97897cf4052be03c248be6ab6&var-patchset_hash=

Tested-by: kernelci.org bot <bot@kernelci.org>

Thanks,
KernelCI team

ChangeLog:

padovan commented 1 month ago

I'd use CI systems: rather than just CIs. Other than that this seems enough to deploy imo and then we iterate on improving it as we go.

crazoes commented 1 month ago

@JenySadadia apart from what @padovan said, last time we also discussed adding the grafana dashboard link pointing to the failures in the build section. It will be great if you can add that too.

JenySadadia commented 1 month ago

Thanks for the review @padovan @crazoes.

Here is a revised version with the above suggestions applied:

KernelCI report for stable-rc: linux-5.4.y@v5.4.279-79-g51945679d212

OVERVIEW

        Builds: 1 passed, 2 failed

         Tests: 0 passed, 1 failed

    CI systems: maestro, tuxsuite

REVISION

    Commit
        name: v5.4.279-79-g51945679d212
        hash: 51945679d212aae61a418eff41370c13da94f94d
    Checked out from
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y

BUILDS

    Failures
       -arm64 (cros://chromeos-5.4/arm64/chromiumos-qualcomm.flavour.config)
       Build detail: https://kcidb.kernelci.org/d/build/build?orgId=1&var-id=maestro:66969c63aa07c494f8d22ad8
       Build error: make: *** [arch/arm64/Makefile:170: vdso_prepare] Error 2
       CI: maestro

       -s390 (defconfig)
       Build detail: https://kcidb.kernelci.org/d/build/build?orgId=1&var-id=tuxsuite:2jKrMEMuY667cDnJgLrHdpnjxvS
       CI: tuxsuite

BOOT TESTS

    Failures
      x86_64:(x86_64_defconfig)
      -acer-cbv514-1h-34uz-brya
      CI: maestro

See complete and up-to-date report at:

    https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=51945679d212aae61a418eff41370c13da94f94d&var-patchset_hash=

Tested-by: kernelci.org bot <bot@kernelci.org>

Thanks,
KernelCI team

ChangeLog:

JenySadadia commented 1 month ago

In case of no build/boot failures are found, the respective sections would like the below:

BUILDS

    No build failures found

BOOT TESTS

    No boot failures found
crazoes commented 1 month ago

Thanks @JenySadadia

I think @padovan meant to replace CI with CI systems everywhere in the report. I see that Build and Boot sections still have it as CI.

Once you do this change, I think we should be ready to send this report to me, you and Padovan initially shreeya.patel@collabora.com gustavo.padovan@collabora.com

Gregkh sends out an email every week for the stable-rc releases but we don't know when exactly he sends it so we want to also try to detect email notification from Gregkh and then send out this report asap to ourselves first. Once I verify the details, I'll forward it to the stable-rc mailing list

JenySadadia commented 1 month ago

I would like to draw your attention to the fact that I have just considered builds and tests with FAIL status in the report and not handled results with ERROR and MISS. One of the reasons is that we need to provide an explanation of errored and failed nodes in the report, and that would make it complex at this level.

JenySadadia commented 1 month ago

I'd like to keep this issue open until we verify the results.

spbnick commented 1 month ago

Sure! Just put e.g. Concerns <issue> instead of Closes <issue> in your PR in these cases :+1: