kernelci / kernelci-pipeline

Modular pipeline based on the KernelCI API
GNU Lesser General Public License v2.1
8 stars 20 forks source link

Discrapency in data on Grafana vs result-summary for stable-rt #745

Open musamaanjum opened 2 months ago

musamaanjum commented 2 months ago

Maintainers are already using Grafana dashboard. There was a report that preempt_rt config builds are missing (https://github.com/kernelci/kernelci-core/pull/2397#issuecomment-2272789692). I've investigated and found out that the builds data is visible in results obtained from result-summary.

https://grafana.kernelci.org/d/OKXc44EIz/home?orgId=1&var-origin=maestro&var-tree=stable-rt&var-branch=All&var-test_path_regex=%25&var-platform=%25&var-config=%25&var-datasource=cdmoe4lcafu2od

I'll attach the results file from result-summary below in the comments as it isn't attached here.

The discrepancies are as follows:

  1. The date is different on both. Grafana shows 2024-08-07 while result-summary shows 2024-08-06.
  2. The preempt_rt jobs aren't present on Grafana and preempt_rt isn't present on config column. Probably preempt_rt jobs are missing.
  3. Grafana only shows reuslts for v6.6.44-rt39 branch. Other 2-3 branches are missing.

cc: @nuclearcat @padovan

musamaanjum commented 2 months ago

stable-rt.html.log

@helen-fornazier I'm unable to assign this issue to you. Please have a look at what is causing the discrapency.

helen-fornazier commented 2 months ago

I see all these branches for stable-rt

image

what is missing?

But indeed, I wans't able to find node 66abc518e49a7366b292a076 in KCIDB for instance (which is present in the report you sent). @JenySadadia could you check please?

Also, shouldn't these node_timeouts be a MISS ?

image

helen-fornazier commented 2 months ago

about the MISS, I just noticed, these are build errors, we need this https://github.com/kernelci/kcidb-io/issues/82

musamaanjum commented 2 months ago

But indeed, I wans't able to find node 66abc518e49a7366b292a076 in KCIDB for instance (which is present in the report you sent). @JenySadadia could you check please?

@helen-fornazier @JenySadadia This is my only concern at this time. The data should have been the same at both places.

JenySadadia commented 2 months ago

But indeed, I wans't able to find node 66abc518e49a7366b292a076 in KCIDB for instance (which is present in the report you sent). @JenySadadia could you check please?

Yes, I am unable to find https://staging.kernelci.org:9000/viewer?node_id=66abc518e49a7366b292a076 on KCIDB dashboard. But it is present in the new grafana dashboard. Right? If so, maestro did send the data and KCIDB dashboard is not showing it somehow. Could you please check? @spbnick

JenySadadia commented 2 months ago

I checked staging logs. Maestro didn't submit this entry. Then how did it reach to the new dashboard? Is that any other source submitting maestro data to it? @helen-fornazier

helen-fornazier commented 2 months ago

Let me clarify things:

about 66abc518e49a7366b292a076:

So the question is: why it is not in KCIDB ? , why maestro didn't submit it ? Why do we have this inconsistency? (cc @JenySadadia )


image

image

JenySadadia commented 2 months ago

Hello @helen-fornazier @musamaanjum

I analyzed the staging logs and found the root cause. From the logs, kcidb bridge service crashed on 08/01/2024 06:17:29 PM UTC and restarted on 08/02/2024 12:16:55 AM UTC.

The node https://staging.kernelci.org:9000/viewer?node_id=66abc518e49a7366b292a076 was updated at 2024-08-01 08:08:57 PM UTC. That's why we lost the updated event from API as bridge service was not running at that time. Hence, KCIDB submission is missing for the node.

This issue has been partially taken care of by a patch that auto-restarts all the pipeline services after a crash. The patch has been merged and deployed on 2nd Aug.

musamaanjum commented 2 months ago

I've checked stable-rt. There hasn't been any update for 8 days. Let's wait to see if we get correct and coherent results on Grafana on the next run.

crazoes commented 4 weeks ago

@musamaanjum @helen-fornazier @JenySadadia can we close this task if it has been resolved?