matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.89k stars 2.65k forks source link

Feature Request: Ambiguous tracking script endpoint & default to POST tracking method #19183

Open KZeni opened 2 years ago

KZeni commented 2 years ago

I would love to see an ambiguous, customizable, and/or randomized/auto-generated (ultimately with the goal of having it be non-identifiable) tracking script endpoint.

Also, see my comment below (https://github.com/matomo-org/matomo/issues/19183#issuecomment-1121325140) regarding the additional item regarding using POST by default (per GET surfacing the parameters which make it likely to be blocked there as well.)

Summary

As tracking/ad blocking is becoming more prevalent, the fact that the tracking being done here is always using something named "matomo" (or "piwik" for older setups) when it comes to what's loaded for the tracking can make it an easier target (even when you self-host since they can just look for that filename regardless of the domain its on & block the variants Matomo might be using.)

You can already have the domain be not identifiable as tracking-related per self-hosting & other options making it so the tracking code is pulled from somewhere likely to not be marked as being used exclusively for tracking (possibly on the same domain as the site you're viewing.)

However, that endpoint where it's matomo/piwik .php/.js is more likely to be blocked regardless of the domain. So taking some inspiration from https://blog.jonlu.ca/posts/matomo-bypass has it where it simply points to a "js" folder on that domain (no longer safely identifiable as Matomo tracking) which then has the server-side of things route it accordingly from there. At that point, you can have something like "some one-off domain/js/" (instead of "some one-off domain/matomo.php" or something) as what's being loaded/sent requests which likely hasn't been identified as tracking and/or isn't easily identifiable & able to be blocked in a safe way (at least for the initial request; if not for the whole tracking setup.)

That method makes it ambiguous/non-identifiable, I can also see it be made randomized or possibly use the site ID as the endpoint (again, with the server being updated to send things where it actually needs to go after taking in the request) if just having it as "/js/" could be problematic or otherwise limiting... however, that might just add unnecessary complexity.

Another idea is to have it be customizable on the tracking code settings page per-measureable where you can set it to whatever you want and that saves out the updated server-side routing and updates the tracking code it provides to use that new endpoint. However, I'm wondering if this adds a manual step to get these benefits when it might be better to have the default setup provide these benefits.

I'm kinda leaning towards the first option where it's simply named something ambiguous that any site might be using ("/js/" isn't the worst idea), but the domain/folder the Matomo install is in just has that routed as needed to act as tracking (again, already being able to customize the tracker domain via something like https://plugins.matomo.org/TrackingCodeCustomizer [among other options out there, possibly] where you might have the Matomo install in a centralized/identifiable spot while you have the tracker called from elsewhere.)

Addressing the file endpoint seems like the next step here so people are getting the most accurate & reliable data (with the site still needing to honor cookie & privacy policies... this just addresses browsers & extensions looking to auto-block tracking possibly without the user even knowing.)

This would possibly even become a noteworthy differentiator for Matomo compared to Google Tag Manager and Google Analytics since those are both entirely centralized where they will always be the targeted (and most easily targeted) items which is causing their data to become more & more incomplete.

Your Environment

Running the latest version of Matomo. This is applicable for pretty much everyone (unless they've added their own custom code to address this) as this is currently addressing the default behavior with there also not being a plugin to address this currently, either.

Itemized Requests

ghost commented 2 years ago

I do something like this manually by defining my own alias (redirected by Apache URL rewriting) for the Matomo script.

Something else that may be related is that POST rather than GET should be the default method of sending tracking data, because some blockers (notably, UBlock Origin in its default configuration) block field names passed in the GET URL. Changing the directory name doesn't help when it's the field names triggering the block. I think that POST is preferable for user privacy reasons anyway - it keeps tracking data out of the HTTP server's log, so there is one less repository of data to protect as private - but although the Matomo documentation includes instructions on how to choose POST, POST is not the default and the documentation doesn't emphasize the likelihood of GET requests being blocked.

KZeni commented 2 years ago

That's a good point! I would definitely love to see POST used as the default instead of GET for that reason as well as making the endpoint ambiguous (both with the end goal of having bulk tracking blockers less likely to block things [while still following the do not track, opt-out, and/or other policies to have it still meet legal requirements, but at least it's not getting caught by the bulk blocking tool[s] a visitor might have.]) I've edited my initial details to include that as an item.

Now, I do see that POST needs Matomo to either be on the same domain or have CORS be utilized (per https://matomo.org/faq/how-to/faq_18694/) to have it go through. However, can't Matomo auto-add (and remove if no longer present) the URLs provided via the measurable(s) to the CORS setting to make it so POST can safely be used by default while also not needing CORS to be manually managed?