Open kingo55 opened 8 years ago
Agree, we need an initiative to write a contributors guide for the data file...
More than just naming conventions - also rules like:
Any given referrer URI should only be found in the database once. If the same URI is used for two different mediums, like search and paid, then we should give the traffic the benefit of the doubt and make it search (i.e. don't assume paid).
(https://github.com/snowplow/referer-parser/issues/130#issuecomment-234183125)
Also useful:
@alexanderdean - happy to work on something like this too. Can a pull request be made for GH wikis?
We also assembled more ESP domain names and would like to see them mentioned in the referer.yml file. Is there a process by now?
The process is not yet finished, but we are working on it. Please open a PR and we will get a new versions of the referer.yml database published.
The unfinished work is around making the Java client read an external file, and updating the Snowplow enrichment to support that external file.
Well, i don't have a proper referer.yml file but a list of 3600+ ESP domain names. Would that be of any interest to you?
Wow - that does sound interesting!
For new additions to the referrer YAML, it would be helpful if there were some guidelines on how to name / group sites.
Based on what I see in the YAML now, I don't even agree with past contributions I've made. E.g.:
Naver Mail
should just beNaver
... just like howGoogle
is representedemail
makes sense because we can see traffic arriving through Cheetah Mail or Responsys servers but we wouldn't call them an email provider.Thoughts?