google / slo-generator

SLO Generator computes SLIs, SLOs, Error Budgets and Burn Rates from supported backends, then exports an SLO report to supported targets.
Apache License 2.0
489 stars 78 forks source link

Fail in export to BigQuery #287

Open casmssm opened 2 years ago

casmssm commented 2 years ago

Using bigquery as exporter with data a lot the following output was received:

image

lvaylet commented 2 years ago

Hi @casmssm, could you provide us with more details and context? Like the SLO definitions and configuration files you are using? I need to reproduce the issue before I can troubleshoot it.

casmssm commented 1 year ago

Hi @lvaylet, We have SLOs a lot and it follows the good_bad_ratio template. Our configuration file as sample is below.

shared-config.yaml

bigquery: dataset_id: app_dataset project_id: xxxx-xxx-xx-prd table_id: slos

lvaylet commented 1 year ago

Thanks @casmssm.

These messages might not be errors at the end of the day. Just INFO-level messages informing you of what is going on under the hood. I am not sure as I am unable to reproduce the observed behavior on my machine.

casmssm commented 1 year ago

Hi @lvaylet. We talking about 100 SLOs is less than 1 minute. About the issue #288 , We will test yet. The data is in the BigQuery, but I don't know if all them. Our scenario hasn't debug enabled. One example of Bigquery data for each error_budget_policy_step_name (we have 10 eb windows) is: [{ "service_name": null, "feature_name": null, "slo_name": null, "slo_target": "0.995", "slo_description": "99,5% de todas as mensagens consumidas da fila de renew com sucesso (ignorando erro de template)", "error_budget_policy_step_name": "28 days", "error_budget_remaining_minutes": "197.72928000000078", "consequence_message": "Unfreeze release", "error_budget_minutes": "201.60000000000019", "error_minutes": "3.8707199999993946", "error_budget_target": "0.0050000000000000044", "timestamp_human": "2022-10-28 22:16:03.942514 UTC", "timestamp": "1666995363.0", "cadence": null, "window": "2419200", "bad_events_count": "1000", "good_events_count": "10468101", "sli_measurement": "0.999904", "gap": "0.0049040000000000195", "error_budget_measurement": "9.5999999999984986e-05", "error_budget_burn_rate": "0.0", "alerting_burn_rate_threshold": "1.0", "alert": "false", "metadata": [{ "key": "slo_id", "value": "111" }] }