Open hardtke opened 2 years ago
Thanks for reporting this, @hardtke. Do I understand correctly that you'd like to preserve all data that you've collected thus far, and thus, it's not sufficient to remove the median column from the audit table?
I think a viable backwards compatible change would be to keep the median column in the audit table, but ensure that going forward, we simply write a null
in that column instead of trying to select it from the feature_info
table which fails.
Changing line 28 in https://github.com/kristeligt-dagblad/dbt_ml/blob/master/macros/hooks/model_audit.sql#L28:
feature_info: &default_feature_info ['*']
to
feature_info: &default_feature_info array(select as struct input, min, max, mean, cast(null as float64) as median, stddev, category_count, null_count)
should do the trick I believe.
Our model audit post hook started failing recently. As far as I can tell, Bigquery ML removed the median column from ML.FEATURE_INFO. Does anyone have a fix that can preserve our historical model data?
{% macro _audit_table_columns() %}
{% do return ({ 'model': 'string', 'schema': 'string', 'created_at': dbt_utils.type_timestamp(), 'training_info': 'array<struct<training_run int64, iteration int64, loss float64, eval_loss float64, learning_rate float64, duration_ms int64, cluster_info array<struct<centroid_id int64, cluster_radius float64, cluster_size int64>>>>', 'feature_info': 'array<struct<input string, min float64, max float64, mean float64, median float64, stddev float64, category_count int64, null_count int64>>', 'weights': 'array<struct<processed_input string, weight float64, category_weights array<struct<category string, weight float64>>>>', }) %}