Open pld opened 3 years ago
Posting a comment from slack: if we are going in Pinot's direction we might as well think about something like cassandra which is in the same class of tools imho and arguably more well-known.
I also think that in comparing these different tools we need to make them produce the same report from the same database and then we can objectively weigh them against each other and against postgres. Luckily (ha!) we can use one of the already existing and challenging Zambia IRS reports.
@ukanga pinot's arch at first glance seems fairly similar to Druid, but isn't Superset-compatible. Is there a killer feature it has that you're most interested in? I ask because we've got a little experience in Druid clusters, so wondering how that went for our analytics - I think presto is also compatible with Druid?
@moshthepitt re questions
There's a lot of details around editing deployed views, etc. to figure out but it's a very targeted fix
The 3 options here feel like 3 separate approaches, and (I think?) only materlialize.io addresses the specific problem of view recomputation, although they all target the same underlying problem (reduce time to compute analysis results on big data)
This leads to the broader question of how we want to solve the problem, and I see at least 4 approaches:
Dumping everything into redshift and seeing how long our expensive mat views run as queries could be worth looking at, but it looks like there’s a bunch of optimization (designing and tuning tables) that happens there too, although it does support geo, https://docs.aws.amazon.com/redshift/latest/dg/geospatial-overview.html, so we'd have to think if we're merely shifting the problem, probably can't answer that until we spend a day testing.
CC @mberg
Questions from @moshthepitt
Tasks: