ing-bank / popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎
https://popmon.readthedocs.io/
MIT License
493 stars 33 forks source link

feat: hist_juxtaposition #245

Closed pradyot-09 closed 2 years ago

pradyot-09 commented 2 years ago

For now, the last_n is by default set to 2. Therefore, only two dates would appear in the dropdown. For the airline dataset if the last_n is set to max, popmon runs into the issue (for DEPARTURE feature) raised by Tomek https://github.com/ing-bank/popmon/issues/244.

Screenshot 2022-08-15 at 19 39 24

closes ing-bank/popmon#230

sbrugman commented 2 years ago

Now that the issue in histogrammar should be resolved, we can test if that results in a passing ci/cd here.

@pradyot-09 Until there is a new histogrammar release, you could add the git branch to requirements.txt for development purposes. Adding a line such as this should work:

histogrammar @ https://github.com/histogrammar/histogrammar-python.git
sbrugman commented 2 years ago

@pradyot-09 popmon_tutorial_reports.ipynb needs to be updated for the tests to pass

sbrugman commented 2 years ago

The current implementation is a great step forward, as it allows users to view individual histograms, which wasn't possible before:

image

The implementation might be better using two plotly dropdowns in the same figure. This removes the redundant information that is now present in the plots: the reference is available both left and right, and histogram_prev1 is the same as histogram once selected.

pradyot-09 commented 2 years ago

@sbrugman Just to clarify: A single plot with a dropdown to change the dates? asking because we discussed one of the main reason to have the plots side by side was to compare them.

sbrugman commented 2 years ago

@sbrugman Just to clarify: A single plot with a dropdown to change the dates? asking because we discussed one of the main reason to have the plots side by side was to compare them.

Side-by-side or overlay should be both fine. The confusion in the current approach is the redundant information (histogram_prev1 may be the same as histogram in the other). You could consider adding the reference to the drop-down, and showing one histogram per drop-down.

pradyot-09 commented 2 years ago

@sbrugman Have pushed the histogram plots with updatemenus. The code looks neater now. For static references the "histogram_ref" plots are repetitive. I think it would be clever to have the "histogram_ref" only once for static references. However, I think that we would need to pass the Settings object rather than the Report object to report_generation(). Not sure about the best way to get reference_type in histogram_section.

Screenshot 2022-08-25 at 18 12 54
pradyot-09 commented 2 years ago

@sbrugman pushed the new histogram inspector with overlays. For now, I have kept double dropdowns for static reference too because I thought it was just very easy to disable overlays if needed. Let me know what you think.

Screenshot 2022-08-31 at 09 19 00