mvisonneau / gitlab-ci-pipelines-exporter

Prometheus / OpenMetrics exporter for GitLab CI pipelines insights
Apache License 2.0
1.25k stars 237 forks source link

Test Report Metrics not working #682

Closed clawoflight closed 1 year ago

clawoflight commented 1 year ago

When I enable project_defaults.pull.pipeline.test_reports, the logs contain a lot of these messages:

{"error":"could not fetch test report for ID: json: cannot unmarshal string into Go struct field PipelineTestCases.test_suites.test_cases.system_output of type gitlab.SystemOutput","level":"warning","msg":"pulling ref metrics","project-name":"PROJECT","ref":"REF","time":"2023-07-05T10:42:38Z"}.

Is there some misconfiguration on our side?

We are using GitLab 16.1 and version 0.5.5 of the gitlab-ci-pipelines-exporter deployed using version 0.3.1 of the helm chart.

clawoflight commented 1 year ago

I guess it's maybe related to #669, although I am also getting this error on pipelines that are not failing (they do have non-blocking manual jobs which weren't triggered though!) If the linked issue is the solution, I would kindly ask for a new release with the fix :)

jasonwliu commented 1 year ago

It's been addressed in the issue you linked however there hasn't been a new minor release. You can target the "latest" docker tag or build from source in the interim.

clawoflight commented 1 year ago

I tested the "latest" image and still reproducibly get an error, though now only for one of my repos:

"could not fetch test report for 43332: json: cannot unmarshal array into Go struct field PipelineTestCases.test_suites.test_cases.system_output of type string"

I get metrics for all other repos :)

clawoflight commented 1 year ago

The difference, as far as I can tell, is that some of the tests in the non-working repo have arrays of strings in their test_suites.test_cases.system_output.

It seems like this can happen when there are multiple <failure> elements for a test suite in a junit test report.

jasonwliu commented 1 year ago

Yeah, the return type of system_output is dependent on the testing framework, we've seen multiple return types (array of strings, just a string, and struct with Type and Message strings). A brief explanation can be found here on why that is.

clawoflight commented 1 year ago

Thanks for the replies @jasonwliu!

So currently this project only supports a subset of testing frameworks in some cases? That's unfortunate. May I ask what the reason is for deserializing the system_output at all? It is not exposed anywhere and surely doesn't make much sense in a prometheus environment. How about simply skipping/ignoring that field, that way we can collect all test jobs? :)

jasonwliu commented 1 year ago

Yes, in its current state it only works for tests that have a string as an error message. You might be able to work around other frameworks by transforming the format of the error output.

As for the deserialization, that happens in the go-gitlab library and can't be "turned off".

clawoflight commented 1 year ago

Could we swallow this error in the the deserialization? I'm not very familiar with go. Alternatively we would need to find a solution upstream. But requiring a workaround for each CI job of each user is not very realistic :)

jasonwliu commented 1 year ago

Not in this project, it'll have to be a change in the upstream to swallow the errors and still return something meaningful.