sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 17 forks source link

Telemetry collector is catching BQ's internal open telemetry #177

Closed bastienboutonnet closed 2 years ago

bastienboutonnet commented 2 years ago

Running select distinct implementation_module_name from intelligence_dwh.soda_sql_telemetry.base_soda_sql_events returns google.cloud.bigquery.opentelemetry_tracing which made one of our reporting transformation DQ test fail.

We should ideally avoid catching telemetry messages coming from other libraries so that we don't end up polluting our events with stuff that other people track.

vijaykiran commented 2 years ago

Notes: Looks like bigquery module doesn't give an option to disable it, because it is setting it up at module load: https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/opentelemetry_tracing.py#L25

bastienboutonnet commented 2 years ago

What I'm not sure I understand is why those events get pushed into our collector. Is it something about how we implemented it that somehow redirects BQ's collector URL to ours?

It seems like the more people would implement OT the more everyone would start collecting each other's data which is less than ideal.

vijaykiran commented 2 years ago

BigQuery library only creates spans - and provides a way to collect the instrumentation data (https://github.com/googleapis/python-bigquery#instrumenting-with-opentelemetry). The library doesn't have any collector by default.

Since we configure a collector ourselves, the data is pushed to it because the spans are created. There can be many different exporters configured in an application. And there is no easy way to (https://github.com/open-telemetry/opentelemetry-collector/issues/2310) disable it.

I'll see if we can monkey-patch python-bigquery or perhaps create a version of exporter.

m1n0 commented 2 years ago

fixed, we filter out any non-soda spans very strictly based on their name (span name starts with soda)