Closed philbooth closed 6 years ago
I estimate we could save about 5 gigabytes from our ~7 gigabyte per-day dataset size.
Crikey. I knew we were operating at some scale, didn't realize the logs alone were this much.
- Delete all the
flow.begin
andflow.completed
events after using them to populateflow_metadata
(be careful with this one, it might break some queries).
I just checked and this one will break a number of queries. They could all be fixed to use flow_metadata
columns instead, but that's probably not a fair burden to put on people.
Although, if a query breaks and nobody notices it's broken, does it matter?
Not sure, I'll send an email to canvas opinion.
I'm going to weigh in on this soon. I know I've used flow.completed a fair bit, just need to see where.
So there are a few charts on this dashboard that will break bcs of flow.completed
use, as well as queries that aren't on there but are forks of those. Given the huge benefit in terms of space here though, I would be OK with the change. I can just patch those queries to join on flow.metadata
. I haven't used flow.begin
much, I usually use .view
as top-of-funnel.
@irrationalagent, don't worry about flow.completed
, I'm going to keep it. The goal of this issue wasn't to cause upheaval, it was to get whichever speedups could be had without breaking stuff. :smile:
flow.begin
is toast though!
There's a significant proportion of redundant data in our current dataset. We could:
Delete all the
flow.continued...
events after using them to set thecontinued_from
column inflow_metadata
.Delete all the
flow.experiment...
events after using them to populate theflow_experiments
table.Delete all the
flow.begin
andflow.completed
events after using them to populateflow_metadata
(be careful with this one, it might break some queries).Delete the
strict_multi_device_users
import job, which takes aaages and sometimes fails to finish at all.Send less performance events, just keep the ones we're actually using.
I estimate we could save about 5 gigabytes from our ~7 gigabyte per-day dataset size.
/cc @jbuck