ing-bank / popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎
https://popmon.readthedocs.io/
MIT License
496 stars 35 forks source link

Ensure date/datetime representation in plots for Spark when providing Timestamp #233

Closed sbrugman closed 2 years ago

sbrugman commented 2 years ago

When the user provides a timestamp-typed time_axis in PySpark, the time axis is binned in (nano)seconds. This should be displayed in as datetimes in the plots.

pradyot-09 commented 2 years ago

After a lot of debugging, I found that there is a bug in histogrammar with Spark DataFrames. The histogrammar converts timestamps in nanoseconds for binning calculations. However, it fails to convert it back to timestamp for Spark DataFrames. It is a bit more explained in the issue here. I have fixed the issue in histogrammar repo and submitted a PR.