Open pnorman opened 2 years ago
Raw CDN logs are now retained for 30 days and successful requests turned into parquet logs.
Some raw render server logs are retained for 30 days, but I still see others over 30 days on th eservers.
I haven't looked at the reduced precision and historical data generation yet.
I'm now generating two reduced logs - one that drops tile info and retains other info so bunch of tile requests from the same user appear as one row (including how many tiles were request) and another that drops user-specific information like IP and retains tile details, so in future we would be able to generate tile usage details.
I need to back-populate these tables, then retention periods can be adjusted.
219 is about log retention in general, but I want to split off the tile service because it is such a high volume service with 4TB/month of logs. This means there are technical reasons to aggregate data aside from privacy, as that is a lot of data.
Detailed usage patterns