Open pnorman opened 6 years ago
Edit: Changed (1) to 180 days, which agrees with existing piwik setup.
Current web access logs is as following:
Current web access logs is as following:
* Tile Cached: 1/1/2016 onward. All Logs kept. (Stored in Archived)
I can see why we would want to keep this "forever" but anything older than a couple of months could have the IP addresses truncated or otherwise anonymised without impacting the use for stats.
* Planet: 2009 onward. All Logs kept. (Stored in Archived)
Same as above.
* www: 2010 onward. All logs kept. (Stored in Archived)
Same as above.
* wiki: Approximately last 2 weeks rolling. (local logrotate)
Unproblematic.
* nominatim: Approximate last 2 months rolling. (local logrotate)
Unproblematic.
* dev: ~8 years, but varies on popularity of site (local logrotate)
As we have stuff on dev that amounts to public services, I would suggest reducing this to two months or so.
* lists, svn, git: ~2 months (local logrotate)
Unproblematic.
I'm not sure I believe that dev number to be honest - we don't have any logrotate that would do anything like that far as I know.
If @Firefishy meant the rails logs in the logs directory of each checkout then those aren't being rotated at all as far as I know.
There's no real reason to keep the archived logs so long, we're just never set up anything to clean them out. Anonymising them would be a huge amount of work for little gain.
This is one of my action items from LWG privacy policy matters
We need a defined duration for how long we retain personal information in logs. The precise duration doesn't matter, but we need to state how long in the privacy policy, and have some justification.
The main personal information normally in logs are IP addresses, user-agents, referers, and what they requested.
I propose splitting logs into four groups
As a starting point for consideration, how about these times?
If we have a reason to retain a specific log for longer like an ongoing investigation, court order, etc, we could do so. The goal of a log retention policy is to establish defaults when there's not some special case.
cc @simonpoole