meltano / squared

Where the Meltano team runs Meltano! Get it???
25 stars 6 forks source link

chore: limit the amount of context data we parse #684

Open pnadolny13 opened 2 months ago

pnadolny13 commented 2 months ago

We have too much data in the context_base table so performance is poor. The data volume is increasing with time so the last 6 months has more data than all before it. This is likely because more users are on newer versions of meltano that send our rich unstructured events and because usage has grown.

I manually truncated the context_base incremental table to remove all data before this year and made a backup table of the original. The table is transient but the backup is not so it will be properly persisted if we ever need that processed historical data. Since the context_base table will continue to grow and we'll have to manually prune it periodically, I created this PR which limits all downstream tables to filter only for 6 months of data so their performance should be relatively static even as the base table grows.