Open oliverchang opened 2 weeks ago
@vitorguidi @alhijazi any thoughts here?
It does not seem like we have that flexibility, knobs for deduplication are not exposed to the end user in GCP (ref). It takes exception and stacktrace info into account when grouping.
I opened a YAQS for the GCP logging folks, to see if there is anything we can explore and is not evident in the documentation.
How about we put something on the bots to alleviate the problem of a bot polluting errors. Some options:
Sometimes, a single machine can cause an error to bubble up to the top of our error reporting dashboard.
e.g. https://pantheon.corp.google.com/errors/detail/CNTsq_Sb7qfXSw;locations=global?e=-13802955&inv=1&invt=Abf8Rw&mods=logs_tg_prod&project=clusterfuzz-external happens 100k+ times a day, but it's all from a single bot having clock skew issues.
We should investigate if there is a way to reduce noise here by deduplicating error reporting entries by machine / origin.