matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.87k stars 2.65k forks source link

[Bug]: Segment fails to match visits where URL CONTAINS "www.example.com" #21439

Open 9joshua opened 1 year ago

9joshua commented 1 year ago

What happened?

When creating a segment condition like this: Page URL contains www.example.com Matomo fails to include visits to https://www.example.com.

For the purpose of evaluating URLs, Matomo appears to omit the www subdomain.

If the segment condition is formulated like this, visits to https://www.example.com are captured in the segment, along with visits to https://example.com... Page URL contains https://www.example.com

What should happen?

When the www subdomain is included at the beginning of a segment Contains condition, the segment should evaluate the condition without omitting www from URLs. This is done for other subdomains, but not www

How can this be reproduced?

  1. Create a segment: Page URL contains www.example.com
  2. Navigate to https://www.example.com
  3. Note the visit does not appear in the segment

Matomo Version

Matomo 4

Matomo Patch or Minor Version

4.15

PHP Version

8.1

Server Operating System

NA

What browsers are you seeing the problem on?

Chrome

Computer Operating System

NA

Relevant log output

No response

Validations

bx80 commented 1 year ago

@9joshua URLs with a www prefix are treated as a special subdomain since in most circumstances traffic to www.example.org and example.org should be combined. This is causing the problematic behaviour with segments that include a www prefix.

So you know if there is a valid use case for distinguishing traffic between www.example.org and example.org or is the main issue that the created segments with www are not behaving as expected?

9joshua commented 1 year ago

@bx80 This is more of an issue that the result is not as expected. There are no use cases I can think of where this specific condition would be required.

This was confusing for a customer who understandably expected Page URL contains www.example.com to include visits to https://www.example.com

Stan-vw commented 1 year ago

@9joshua what's the ideal outcome here? I'm thinking we could consider adjusting the product, but otherwise it might be better to create an FAQ that explains the logic?

@bx80 if I read your comment correctly, we have some logic that gets rid of the www? Could you help me understand why we're doing this?

Thanks :)

9joshua commented 1 year ago

I don’t think an FAQ is a solution because this is unexpected behaviour. It would be hard to find the specific FAQ that relates to the issue, if the user thinks to search for it in the first place.

There could be a special condition where search terms beginning with www. trigger an assessment of both www.example.com and example.com. This would be similar to how searching for http://www.example.com currently works.

bx80 commented 1 year ago

@Stan-vw Yes we have some logic that merged www.example.org urls with example.org urls. This is because for a vast majority of websites the www subdomain and base domain are synonymous. Visitors directly entering the site and often referral sites will use www.example.org and example.org interchangeably so without this merging all analytics data would be arbitrarily split between the two domains which would serve no useful purpose.

Adding a special condition to check two different things could potentially have performance impacts when building the segment, though more investigation would be required to confirm this.

Another possible solution could be to detect segments created for www.example.org and simply remove the www before saving.

michalkleiner commented 1 year ago

How about a global setting per measurable that would treat www and non-www domains the same (could be the default) or separate? Then we can use this across all areas and users can configure the behaviour.

9joshua commented 4 months ago

Just FYI this also affects Custom Reports. i.e. filtering for Page URL contains www.example.com will return zero results even if the page URL was recorded with the www subdomain in standard reports.