squaredev-io / whitebox

[Not Actively Maintained] Whitebox is an open source E2E ML monitoring platform with edge capabilities that plays nicely with kubernetes
https://squaredev.io/whitebox/
MIT License
184 stars 5 forks source link

Reports are created on grouped inference rows #151

Closed sinnec closed 1 year ago

sinnec commented 1 year ago

Inference rows are grouped by timestamp based on the model's granularity and the previous report's timestamp and create the reports by timestamp. E.g. a timestamp 2023-03-03 12:33:25.34432 when granularity is set to 2D and the previous group's timestamp is 2023-03-03 00:00:00 will be converted into 2023-03-05 00:00:00. If there is no previous timestamp, everything is grouped based on datetime.utcnow() and the granularity..

In the same manner, newer inference rows with older timestamps, are grouped with the corresponding "used" inferences and used to create a new report that will replace the outdated one (keeping the same report id).

closes: #126 closes: #137

sinnec commented 1 year ago

Also the is_used flag i feel is not the optimal way. Im thinking something time based rather than marking entries. Im not saying we should change it now (your decision) but im not sure this implementation will stick.

Well that was suggested and stuck with it. Not sure if the time based filter will work but needs analysis to validate.

momegas commented 1 year ago

Btw if we go with the granularity on the model we also need to update the front end (we got dat thing now)

sinnec commented 1 year ago

@sinnec I tried a test with real data and I noticed a mismatch in the timestamp. For instance, I inserted daily inference data with only the date in the following format: "2022-12-22" , "2022-12-23" , "2022-12-24" etc. In the timestamps, I received in all reports (descriptive, drift, performance etc.) the timestamps were on the below form: 'timestamp': '2022-12-22T00:07:00' For some reason, always 7 minutes are added in the timestamp.

Upon thinking about it, this behaviour was the intended one. Maybe not the desired one but it worked as expected. If I remember correctly the tests were run at 2023-03-02T14:22:00 UTC time. Since there was no other report to get as a starting point (time wise), the timestamp was calculated based on that time (which is utcnow()), so the appropriate timestamp group of 2022-12-22T00:00:00 IS 2022-12-22T00:07:00 based on a 15minutes granularity. If you subtract 15 minutes from 22 you get 7. The timestamp groups would be every 7, 22, 37 and 52 minutes of the hour and not 00, 15, 30 and 45.

In this case, it should be examined if we need to change the starting point to something rounder, like midnight or something.

NickNtamp commented 1 year ago

@sinnec I tried a test with real data and I noticed a mismatch in the timestamp. For instance, I inserted daily inference data with only the date in the following format: "2022-12-22" , "2022-12-23" , "2022-12-24" etc. In the timestamps, I received in all reports (descriptive, drift, performance etc.) the timestamps were on the below form: 'timestamp': '2022-12-22T00:07:00' For some reason, always 7 minutes are added in the timestamp.

Upon thinking about it, this behaviour was the intended one. Maybe not the desired one but it worked as expected. If I remember correctly the tests were run at 2023-03-02T14:22:00 UTC time. Since there was no other report to get as a starting point (time wise), the timestamp was calculated based on that time (which is utcnow()), so the appropriate timestamp group of 2022-12-22T00:00:00 IS 2022-12-22T00:07:00 based on a 15minutes granularity. If you subtract 15 minutes from 22 you get 7. The timestamp groups would be every 7, 22, 37 and 52 minutes of the hour and not 00, 15, 30 and 45.

In this case, it should be examined if we need to change the starting point to something rounder, like midnight or something.

I am not sure that I understand correctly, but in any case, if we have a granularity greater than a day, there is no need to display the time.

sinnec commented 1 year ago

Maybe just include some tests?

That's the reason that it was marked as Draft. It needs to be tested!

sinnec commented 1 year ago

@sinnec I tried a test with real data and I noticed a mismatch in the timestamp. For instance, I inserted daily inference data with only the date in the following format: "2022-12-22" , "2022-12-23" , "2022-12-24" etc. In the timestamps, I received in all reports (descriptive, drift, performance etc.) the timestamps were on the below form: 'timestamp': '2022-12-22T00:07:00' For some reason, always 7 minutes are added in the timestamp.

Upon thinking about it, this behaviour was the intended one. Maybe not the desired one but it worked as expected. If I remember correctly the tests were run at 2023-03-02T14:22:00 UTC time. Since there was no other report to get as a starting point (time wise), the timestamp was calculated based on that time (which is utcnow()), so the appropriate timestamp group of 2022-12-22T00:00:00 IS 2022-12-22T00:07:00 based on a 15minutes granularity. If you subtract 15 minutes from 22 you get 7. The timestamp groups would be every 7, 22, 37 and 52 minutes of the hour and not 00, 15, 30 and 45. In this case, it should be examined if we need to change the starting point to something rounder, like midnight or something.

I am not sure that I understand correctly, but in any case, if we have a granularity greater than a day, there is no need to display the time.

When the granularity is a day or greater the time displayed is always 00:00:00!

NickNtamp commented 1 year ago

Overall looks good!

Did you tested @stavrostheocharis with real data? Checking that the reports are as expected, the structures, the insertions and updates to db etc.?

stavrostheocharis commented 1 year ago

No, I only had code review

sinnec commented 1 year ago

I ultimately changed the start_time to be the start of the current day in case the start time is not set by a previous report's timestamp. This means that is granularity is 15minutes the timestamp will be grouped every 00, 15, 30 and 45 minutes regardless of the time the cron tasks run.

In Nick's above cases the report timestamps would be 2022-12-22T00:15:00 for 2022-12-22T00:00:00 (00:00:00 is the start of a new group. The end of a group would be 23:59:59)

momegas commented 1 year ago

This is open for quite some time. What else is needed here? @sinnec do you need another review can I halp somewhere?

sinnec commented 1 year ago

@momegas Well yeah, I've requested everyone for a review and if it's OK and don't have more questions it can be merged! 😊

momegas commented 1 year ago

The people requested changes previously need to approve again in order to merge. I can merge add admin but the point is for everyone to agree 😎