mozilla / docker-etl

Collection of dockerized ETL jobs managed by data engineering.
Mozilla Public License 2.0
19 stars 15 forks source link

Add chunking to bugbug classification in broken_site_report_ml #187

Closed ksy36 closed 7 months ago

ksy36 commented 7 months ago

Since I decided to raise all errors related to unsuccessful bugbug requests in https://github.com/mozilla/docker-etl/issues/185, there has been too many failures (see https://workflow.telemetry.mozilla.org/dags/broken_site_report_ml/grid?dag_run_id=scheduled__2024-04-19T15%3A45%3A00%2B00%3A00&task_id=broken_site_report_ml&tab=logs).

I'd like to add chunking to make sure that after a failure no new reports are sent for classification, but rather the same "unsuccessful" chunk. That would help to reduce the number of failures as bugbug has caching and it's possible to get classification at a later time when results are ready, as long as the same set of reports is sent.