ibericode / koko-analytics

Privacy-friendly, open-source and lightweight analytics for your WordPress site.
https://www.kokoanalytics.com/
GNU General Public License v3.0
367 stars 30 forks source link

Aggregate certain referrer hosts #43

Closed dannyvankooten closed 11 months ago

dannyvankooten commented 4 years ago

This is not an exhaustive list, so suggestions are welcome.

dannyvankooten commented 4 years ago

So after some more thinking, there are two possible option for each referrer URL coming in.

1. Multiple URL's with the same meaning

In case of google.com/search, bing.com/search and any URL that has multiple versions pointing to what is essentially the same thing, we want to store them in the simplest form possible.

So google.com/search becomes google.com.

In my opinion, google.de/search becomes google.de instead of google.com as the TLD does hold some information that may be valuable.

2. All other URL's with a path component

In most other cases, we want to keep the full URL but also offer an aggregated total in the dashboard so that we can see the total amount of traffic coming from that domain while still being able to zoom in and see which pages actually generated that traffic.

Option 1 is taken care of cd4a743c2b65f6eb2da227c3052273633684f8b7 and allows us to easily extend that list with other mainstream options, a filter hook may be useful for users.

I haven't yet gone over the details for solving option 2 but am reasonably confident it can be done without massively inflating storage requirements.

rghedin commented 4 years ago

Two questions regarding this issue:

First, Twitter's t.co link shorter really adds up quickly as individuals entries. I don't know how challenging this would be, but a great way to avoid this would be aggregate them and offer a "plus button" aside the main t.co entry that, when clicked, expand a list with all individual links.

The other one is related to Feedly, the RSS aggregator. I noticed clicks from collections at Feedly are shown as a complete, private (non-accessible by anyone) on stats. Real example: https://feedly.com/i/collection/content/user/323ef75e-3ae1-4b9a-9f90-05eefc034813/category/global.all Since this kind of direct link is useless, maybe aggregate all like it in a single, "Feedly Collection" label?

An RSS entry in Feedly that isn't in a collection is perfectly clickable, hence it would benefit from a solution similar to the one suggested above for Twitter's t.co links. Example: https://feedly.com/i/entry/00nexjUMDjWDfmBvVM9H1PdsUJLyPJmkdIH23dKer+c=_16ff7d4bfd1:5f6e6d4:bb2cd839

danielrunvegan commented 4 years ago

Hey Danny, I have combed through the referrer stats from the time since the last plugin update, and want to share the remaining duplicates that I have found. I hope this helps!

Facebook:

Instagram:

Google:

In addition to this, I see the following referrers that seem to indicate the google app as the source. In my opinion, it would make sense to count those as "normal" google.com searches:

Ecosia:

Bing:

dannyvankooten commented 4 years ago

Awesome @danielrunvegan, that is super helpful indeed! Thank you so much.

danielrunvegan commented 4 years ago

@dannyvankooten the last update cleaned up almost all of the duplicates for me! Here are the remaining candidates I see for the last 7 days (with my suggestion as to where they should be aggregatet to):

And the following could all be aggregated to --> https://www.google.com

danielrunvegan commented 3 years ago

@dannyvankooten I've just noticed that the referrer aggregation for Pinterest seems to be broken. I get separate results for the following variations (all are shown as pinterest.com):

pinterest.com www.pinterest.com https://pinterest.com https://www.pinterest.com

arnelap commented 3 years ago

Email newsletter services that use unique links, eg image

rghedin commented 3 months ago

Is this still up to additional aggregations? I noticed that Reddit uses at least three domains for outbound links:

out.reddit.com reddit.com new.reddit.com

Probabely old.reddit.com as well.