ing-bank / popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎
https://popmon.readthedocs.io/
MIT License
493 stars 33 forks source link

StabilityReport only calculated with metrics specified in Report.show_stats #277

Open sbrugman opened 1 year ago

sbrugman commented 1 year ago

Discussed in https://github.com/ing-bank/popmon/discussions/276

Originally posted by **mi2354** July 3, 2023 At the moment (V1.4.4), the StabilityReport needs the `popmon.config.Report` object that has the attribute `show_stats` to display only some metrics in the report, although all metrics are calculated and stored in `StabilityReport.datastore`. Would be great that only the metrics that need to be displayed are calculated, instead of all of them. Another possibility to also speed up the StabilityReport would be to make it optional to create the visualization (make `popmon.pipeline.report_pipelines.ReportPipe` an optional step in the reference pipelines. I have noticed that is not easy to separately use the metrics pipelines directly, but then if only the `StabilityReport.datastore` is needed, it's necessary to go through all the pipeline steps. Please, let me know your thoughts on this! Thanks!
sbrugman commented 1 year ago

@mi2354 Thanks for raising this.

Indeed, show_stats is simply configuring which metrics to display. In order to disable profiles/comparisons, they can be currently removed from the profiles registry. A feature could be added to disable registered profiles/comparisons via the popmon.config.Report (contributions welcome).

There exists MetricsPipelines that are supposed to make the ReportPipeline step optional (your second point). See also this tutorial.

Happy to take any contributions on documenting this more clearly.