ihucos / counter.dev

Web Analytics made simple
https://counter.dev
GNU Affero General Public License v3.0
914 stars 39 forks source link

Fixes page issue #27

Closed kaleidawave closed 3 years ago

kaleidawave commented 3 years ago

Trying counter.dev out and so far it is awesome!

I noticed thought that my requests to nested routes where showing up as "/". Looking at the source it seems that the page / loc is based of the Referer header value. However the referer header documentation says it is not certain that the browser will include the path segment. I double checked the request using dev tools and it seems that was the issue in my case where on a nested route it was only sending the origin.

I have fixed this by explicitly sending location.pathname so it is certain that pathname will be sent. This value does not include fragment identifier or querystring. I have updated the snippet to include this as well as modifying the handletrack.go to read the page parameter (and falling back to Referer for older snippets that do not send this value).

Two other changes to the snippet:

I haven't tested these locally yet but it should be okay. Let me know if there are any changes you want to this πŸ‘

ihucos commented 3 years ago

Hello,

thank you so much for the well thought pull request

Using startsWith

Yes, really great.

Using navigator.sendBeacon

Do you know if navigator.sendBeacon does respect the Expires http header. I would suspect not but did find no information about it. Unfortunately this is one of three used "line of defenses" to reliably count unique users. The tracking code that runs in the user's browser is responsible for only requesting the /track endpoint one time per session and day. There are three complement ways that try to ensure that:

  1. Setting a flag into sessionStorage that the fetch was already made.
  2. Checking via the document.referrer that request really does not come from the same site
  3. Http caching so the request does not arrive to the server if done on the same day.

Even with that what the user's browser does in the end is up to the browser and I did see two times or so a browser requesting the /track endpoint for each page the user visited - skewing up the statistics for the day at least.

So we may need http caching as a way for the browser to really only call /track on the first visit the user makes

Using location.pathname as GET parameter

I just realized now that sending the current page via a http GET parameter would bust the cache for each individual page. So basically we have to make sure the parameters to /track stay about the same and are not too variable.

So these are the cons, of course it can be that the pros outbalance it, I have to confess that I am not sure about this but think it may be more conservative and in this case better to keep it with fetch and the brittle Referrer. Feel free to try to convince me otherwise. I needed some days to reflect about it but am still unsure.

aroemers commented 3 years ago

To chime in here, I noticed the same issue today while integrating my website with Counter. It happens on Safari (macOs) at least, so not really an obscure browser.

I understand the concerns, yet I hope a solution is found. Knowing which pages are openend is - for me at least - important.

kaleidawave commented 3 years ago

Hi, thanks for reviewing the PR.

Looking at navigator.sendBeacon it is a POST request under the hood and it would bypass the cache. And after some testing I found adding a path argument would also break the cache mechanism.


However in my original Referer header research I thought it was fixed behaviour by the browser to only send the origin, though looking back on it I seemed to miss the Referrer-Policy header. It turns out my issue was caused by the Referrer-Policy header on my pages response being set to strict-origin-when-cross-origin which causes subsequent requests outside of origin to only send the origin part of the url.

Changing the value of the Referrer-Policy header to no-referrer-when-downgrade has fixed this issue πŸ‘ . And in my case on GH pages where you cannot change the response headers you can set this value using the following HTML meta tag: <meta name="referrer" content="no-referrer-when-downgrade">. (I have changed this and it is now working!). (Hopefully this fixes your issue @aroemers)

So yeah may be worth mentioning somewhere that deciding pages is based on the Referer header and that you should set Referrer-Policy to send the full path to cross origin requests.

aroemers commented 3 years ago

Thanks for the possible solution @kaleidawave. I tried setting it using the meta tag, the script tag and the fetch API. Turns out, for Safari none of this will work as long as 'Prevent cross-site tracking' setting is enabled; it will never send the path.

Any other way forward?

kaleidawave commented 3 years ago

@aroemers Unfortunately I think that setting is specifically designed to prevent what you are trying to do πŸ˜‚. I had thought of patching the fetch request Referer header but it turns out it is on the forbidden list. The original changes may have fixed it but it breaks the caching solution they are using to add rate limiting. So unfortunately I don't think there is anyway to collect full analytics for those requests using counter.dev, because it is designed for the client. If you have access to the server you are hosting on thought you could collect similar using server logs. I think there are pretty popular analytic middleware for most servers.