internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
669 stars 97 forks source link

Block AMP analytics JS script #157

Closed vbanos closed 5 years ago

vbanos commented 5 years ago

AMP analytics is part of Google analytics. We need to block it for similar reasons. Current URL is

https://cdn.ampproject.org/v0/amp-analytics-0.1.js

but we have also seen:

https://cdn.ampproject.org/rtv/011906111828200/v0/amp-analytics-0.1.js

That's why we suggest the following pattern to block both:

*cdn.ampproject.org/*/amp-analytics*.js

AMP analytics reference:

https://developers.google.com/analytics/devguides/collection/amp-analytics/

vbanos commented 5 years ago

In my experiments, blocking AMP analytics had the positive result of blocking some extra tracking JS:

https://www.googletagmanager.com/gtag/amp?__amp_source_origin=https%3A%2F%2Famp.dev
https://www.google-analytics.com/r/collect?v=1&_v=a1&ds=AMP&aip&_s=1&dt=Product&sr=1600x1000&_utmht=1561584476085&cid=amp-Uv297wPxRBZFmJyHUF8jxQ&tid=UA-73836974-1&dl=https%3A%2F%2Fpreview.amp.dev%2Fdocumentation%2Fexamples%2Fe-commerce%2Fproduct_page&dr=https%3A%2F%2Famp.dev%2Fdocumentation%2Fexamples%2Fe-commerce%2Fproduct_page%2Fpreview%2F%3Fformat%3Dwebsites&sd=24&ul=en-us&de=UTF-8&t=pageview&jid=0.9160045519482727&_r=1&a=36&z=0.66544941106968
https://stats.g.doubleclick.net/r/collect?v=1&aip=1&t=dc&_r=3&tid=UA-67833617-1&cid=amp-Uv297wPxRBZFmJyHUF8jxQ&jid=0.9368874589689906&_v=a1&z=0.2162738678543541

You can test with example page: https://amp.dev/documentation/examples/e-commerce/product_page/preview/?format=websites