ing-bank / popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎
https://popmon.readthedocs.io/
MIT License
493 stars 33 forks source link

Metrics pipeline (erroneously) overwrites reference type to "self" #263

Closed ruudkassing closed 1 year ago

ruudkassing commented 1 year ago

When calling pm_stability_metrics popmon attribute on a Pandas dataframe we noticed that a TypeError is raised for SelfReferenceMetricsPipeline:

TypeError("__init__() got an unexpected keyword argument 'ref_hists_key'")

Which is surprising because we set reference_type="external", so we expect popmon to create an ExternalReferenceMetricsPipeline object.

After some investigation it turns out that in popmon->pipeline->metrics.py->stability_metrics() on line 54-58 you set kwargs["ref_hists_key"] = "ref_hists" because the reference type is set to "external". Then, you create the metrics pipeline and pass the settings and kwargs. However, in popmon->pipeline->metrics_piplines.py->create_metrics_pipline() it uses a kwarg reference_type with as default "self". As a result, the pipeline class is created using the kwarg reference_type instead of via the setting settings.reference_type. Therefore, whenever creating the stability metrics report from a Pandas dataframe the reference type is always "self", without control over it.

I guess this is some leftover kwarg that was forgotten to migrate to the new Settings class after the recent settings migration of popmon. Could you please fix this issue as it's preventing us from updating from popmon to more recent stable versions (1.4).

ruudkassing commented 1 year ago

The fix is very simple btw: removing the kwarg reference_type from create_metrics_pipeline() and then using cls = get_metrics_pipeline_class(settings.reference_type, reference).

I didn't know what your contributor policy is, so for now I only created this issue ticket. I can create a pull request if you'd like and allow me.

mbaak commented 1 year ago

Hi, thanks for reporting, we will have a look!

ruudkassing commented 1 year ago

@mbaak this fix is relatively simple. I don't mind creating a pull request to fix it.

mbaak commented 1 year ago

Sorry, I totally forgot. Please do!