Closed reidlw closed 6 months ago
Worth specifying how often these need to be refreshed? In the context of customer / business data, weekly granularity is probably sufficient for now
@MF416 this should almost all be real-time data
P0 metrics issue: https://github.com/w3s-project/nonpublic/issues/6
Understand the cost and revenue drivers from a business perspective as well as the ability to report to investors:
@travis Pls review!
Thanks @prodalex! I copied this to a Google Doc to make iteration easier - left you a bunch of questions there:
https://docs.google.com/document/d/1jK-FwpO_dWVx2oCHLwJT67yvZM6ZnkfTKWMb2LqE-gM/edit
@MF416 this should almost all be real-time data
@reidlw I don't think this is accurate! it sounds like the biz folks don't need more than weekly granularity for now, and it sounds like many of these stats will be difficult to get in "Real time" - especially the Stripe data and anything that depends on "badbits".
ok I've distilled the discussion in https://docs.google.com/document/d/1jK-FwpO_dWVx2oCHLwJT67yvZM6ZnkfTKWMb2LqE-gM/edit down to the following tasks:
(1) is going to be a one-off - I can probably handle it but it would probably be faster for @alanshaw or someone else who's more familiar with where all the data there ends up and how it gets there (2) sounds like it should be pretty straightforward - need to get details from @alanshaw on what he currently does and set up a cronjob somewhere to do it weekly (https://docs.google.com/document/d/1jK-FwpO_dWVx2oCHLwJT67yvZM6ZnkfTKWMb2LqE-gM/edit?disco=AAABK3hDqzU) (3) sounds like a research project - @vasco-santos had some thoughts in the doc on why this might be hard (https://docs.google.com/document/d/1jK-FwpO_dWVx2oCHLwJT67yvZM6ZnkfTKWMb2LqE-gM/edit?disco=AAABK3hDq6E), so we need to spend some time prototyping and then iterate based on what we find out (4) also a research project - need to figure out if all the signals @prodalex identified in https://docs.google.com/document/d/1jK-FwpO_dWVx2oCHLwJT67yvZM6ZnkfTKWMb2LqE-gM/edit?disco=AAABK3hDqzI are already available in Athena and which we'd need to add, then come up with a plan to add anything we need (5) should mostly be non-engineering work, but worth planning for some amount of support time here
I would say the prio would be:
And i think before we get to the badbits stuff, we should prioritize egress before that. We could only plan a spike to figure out how to do the badbits for now.
Just one quick question @travis : Where in the tasks above would be the #requests (read, write operations) covered?
Quick note - I don't think we need weekly stripe reports given we chart customers monthly? Whatever @alanshaw is doing right now with his monthly cadence is fine from an output perspective (understanding there are probably improvements to make Alan's life easier)
ok - null values are sorted, the rest of the tasks from @prodalex should be scheduled soon!
Where in the tasks above would be the #requests (read, write operations) covered?
@prodalex - probably need to break that into its own task - "successful write operations" is an Athena query (happy to help formulate that if you'd like, just let me know) and "successful read operations" is probably some sort of log query in Cloudflare? I'm not actually sure the best way to get that number - we might already have it in https://daghouse.grafana.net somewhere but if not it will likely be a bit of work to get it...
Closing this parent as we've created new child tasks that are in the backlog and can be treated independently
Tasks: