owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
85 stars 23 forks source link

Add data from Slack to Wizard News #3330

Open Marigold opened 1 month ago

Marigold commented 1 month ago

The News app in Wizard provides a summary of key events in the ETL repository over the past 7 days. This summary is generated from all PRs stored in MySQL.

It would be interesting to apply this concept to all communication in Slack and see if we can generate useful summaries. For example, we could track dataset updates, article or insight changes, projects, bugs, and more. Since many of our apps already send data to Slack (e.g., GitHub updates, bug reports), Slack essentially functions as our communication "data warehouse."

Risks

This could become pretty expensive. We pay $0.35 for News summary which includes just Github activity. We should think carefully about what to include there and what not.

If we store Slack data in MySQL, we have to make sure it's excluded from Datasette public.

lucasrodes commented 1 month ago

On the cost side, I don't see the cost increasing that much. Even if it was 10x, it'd still cost less than 5$ per day. But maybe I'm missing something, and it could be a larger increase?

Marigold commented 1 month ago

But maybe I'm missing something, and it could be a larger increase?

I have no idea to be honest, I was just surprised that just github PRs cost $0.35. I expect that Slack contains much more information (that doesn't have to translate into size though).

lucasrodes commented 1 month ago

@Marigold From what I quickly inspected, the current system prompt includes much of our documentation (e.g. docs for Table, Variable, Dataset, etc. objects). Haven't checked the exact number of tokens, but I could imagine it being 95% documentation + 5% GitHub PRs. Knowing this, I think that the additional cost of Slack messages wouldn't be that high.

larsyencken commented 1 month ago

Could be a good cooldown project, or something for over Xmas.