Closed bastienboutonnet closed 2 years ago
Notes: Looks like bigquery module doesn't give an option to disable it, because it is setting it up at module load: https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/opentelemetry_tracing.py#L25
What I'm not sure I understand is why those events get pushed into our collector. Is it something about how we implemented it that somehow redirects BQ's collector URL to ours?
It seems like the more people would implement OT the more everyone would start collecting each other's data which is less than ideal.
BigQuery library only creates spans - and provides a way to collect the instrumentation data (https://github.com/googleapis/python-bigquery#instrumenting-with-opentelemetry). The library doesn't have any collector by default.
Since we configure a collector ourselves, the data is pushed to it because the spans are created. There can be many different exporters configured in an application. And there is no easy way to (https://github.com/open-telemetry/opentelemetry-collector/issues/2310) disable it.
I'll see if we can monkey-patch python-bigquery or perhaps create a version of exporter.
fixed, we filter out any non-soda spans very strictly based on their name (span name starts with soda
)
Running
select distinct implementation_module_name from intelligence_dwh.soda_sql_telemetry.base_soda_sql_events
returnsgoogle.cloud.bigquery.opentelemetry_tracing
which made one of our reporting transformation DQ test fail.We should ideally avoid catching telemetry messages coming from other libraries so that we don't end up polluting our events with stuff that other people track.