Closed qwert666 closed 1 year ago
I think I understand what you want: a rolling reference that is - say - a year behind the current time. Have you tried the rolling reference functionality? You can specify the window size and how far it should lag behind the current time. See: https://popmon.readthedocs.io/en/latest/reference_types.html#rolling-reference (Or let me know if you mean something else.)
Rolling reference would require to access historical data for building the report and that's not what I can do as I'm having a lot of data and want to fully utilize the histograms.
What I was thinking of was something like:
calculate one time histograms per hour
hourly_histograms = {}
bin_specs = {}
for hour in range(0, 24):
pdf_hour = historical_pdf[historical_pdf.hour == hour]
histogram = make_histograms(pdf_hour, features=features, time_axis="datetime", time_width="1h", time_offset="2023-04-27")
hourly_histograms[hour] = histogram
bin_specs[hour] = get_bin_specs(histogram)
and then when the pipeline is running I only process one hour of data (the most recent)
last_hour_pdf = pd.DataFrame(...) # containing my new data
last_hour_histogram = make_histograms(last_hour_pdf, features=features, time_axis="datetime", bin_specs=bin_specs[current_hour])
btw. can last_hour_histogram be used directly for reports/metrics?
settings = Settings(time_axis="datetime", reference_type="external")
settings.report.extended_report = True
report = last_hour_pdf.pm_stability_report(
settings=settings,
reference=hourly_histograms[current_hour]
)
and then I stitch_histograms for next days comparsion
combined_hist = stitch_histograms(
hists_basis=hourly_histograms[current_hour], hists_delta=last_hour_histogram
)
it all works just not sure if this setup make sense in Popmon and can be done differently
I'll move this to the discussion space
Hi
I have more of a question around using the library as all the examples consists of building histograms that are wider then the defined time_width.
My setup consists of a complex project that has a lot of factors that can influence metrics that I'm interested in keeping an eye on. I have a data pipeline that process the data on hourly basis that means that the data I have access to consists always of one hour. I was thinking of building separate historical histograms for each hour, as I want to compare apples to apples and eliminate the additional noise as I have a lot of seasonalities, within a day, week, month etc.
An example project could be users on a website and keeping track of their page views and generated revenue, and I want to early detect major shifts in page views.
In this usecase from my understanding a reference_type should be as "external" and the time_width would be 1h, and in the reports/metrics I would always have just one hour but then how would the stitch_histograms work, the replace functionality would not work right? and if I would like to control the size of stiched histograms I would need to cap it in a different way 🤔 Does this hourly setup make sense in popmon?
Best