opawg / user-agents

An open, platform-agnostic list of user-agent and referrer regexes for use in podcast analytics services
MIT License
122 stars 71 forks source link

add ids for tracking changes #90

Closed AlexandrePlus closed 3 years ago

AlexandrePlus commented 3 years ago

Adding an id to each user agent make easy to track changes (updates, deletions) that would otherwise be complicated. I have imported the user agents in a database (passing json to a stored procedure). First, to track changes, particularly updates, I thought a good candidate for an id (key) would be the first regex of user agents. Plus, the app field is not always set (even less the 3-tuple app/os/device), and taking the first regex as a key allows to add regexes or even change them without losing track of the user agent. But it's an Achilles heel. By coding a test to check unicity of the matching between regex and examples https://github.com/AlexandrePlus/user-agents/blob/feature-test-unique-correct-matching/src/tests/python/test_compile_regex.py#L37, I have found several issues like regexes (including user agents' first one) that need to be disambiguated or merged ("^gvfs" and "^gvfs/"...). So it becomes difficult to track changes, and update the modified user agent (ON DUPLICATE KEY UPDATE) rather than creating another one (in duplicate). That why, while trying to keep it simple, I propose to add a unique id to every user agent.