matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.73k stars 2.63k forks source link

User-ID: Unique Visitor is not recognized even with the same User-ID #15593

Closed peterbo closed 4 years ago

peterbo commented 4 years ago

I'm setting a User-ID, when a user visits a given Site. On a certain action, I'm triggering a goal serverside with his User-ID as a parameter (and a token). Effect after the Update from 3.12 to 3.13.2 is that the serverside triggered action is not only stored in another visit (which would be ok), but also fails to recognize the web-visitor with the same User-ID. For reference, the before/after screenshots:

Before (Visitor recognized -> new visit but returning visitor): userid-before1

After (Visitor is not recognized -> Visit is not returning -> new visit and new visitor): userid1

Serverside call: https://example.com/piwik.php?token_auth=XXX&cdt=2019-08-07 18:56:10&idgoal=3&revenue=1234&idsite=X&rec=1&r=13454&uid=1234567890

In config, trust_visitor_cookie is disabled.

The reason for that is this change: https://github.com/matomo-org/matomo/commit/ea5a14bdf8aa9608cdc2ab7d5c8236a5ff1eb3e2#diff-6700aaf1ce500fe51e284b9ec6f01b01

The change works in the right direction, but now, a User-ID is only assigned to the same visitor, when also the config_id matches. This doesn't make sense, because the main use case is for example a user who logs into a website with different devices (GDPR aside, but the User-ID is for example the customer ID). This user should be recognized as the same visitor (not necessarily the same visit, but at least the same visitor). @MichaelHeerklotz

Refs https://github.com/matomo-org/matomo/pull/14360

MichaelRoosz commented 4 years ago

This is intended. Since the referenced change the UserID no longer influences the visitor ID or visit ID.

peterbo commented 4 years ago

Is this really what we want? In this case, the User-ID has no more added value than a named CustomDimension.

I think the primary use-case is the cross device recognition of users. I know many instances where the User-ID is used and it is always for this case. If you want to record a User-ID for a visit, decoupled from user recognition logic, one could simply use a custom dimension. Secondary use-case, but equally important, is to receive conversions / other actions server-side from external sites / own internal systems to measure campaign success / other KPIs for a given unique visitor/user.

Using a forced visitor-ID does by far not offer the same flexibility as the User-ID did. And speaking of semantics - User-ID, in my opinion, implies, that's a single user, using multiple devices or is tracked in different places. Therefore it must be the same visitor (in analytics terms).

Dimension definitions from my point of view:

The proposed use-case from https://github.com/matomo-org/matomo/pull/13620 This is useful for example when using the third party cookie, and thus all Matomo sites use the same "global" visitorId for the same device, and some Matomo sites set a userid. Is not practical anymore, because 3rd party cookies are not GDPR compliant and blocked by most new browsers anyway. AFAIK, the visitor_id is queried in connection with the site-id. In my opinion, the other use cases are far more important than this one. I don't quite follow what's the point of changing this feature towards edge cases (3rd party cookies, cross site-id tracking, or a user that has 10 different accounts to log into online gambling) / case that is not relevant anymore.

tsteur commented 4 years ago

ping @mattab

I'm not so much into this topic. I suppose it wouldn't help to do something like setVisitorId(hashedUserId.substr(0,16)) (pseudo code)?

I suppose in general the idea was maybe also that you can see what a user did before logging in and after as part of the same visit? But indeed tracking cross device is getting more complicated.

mattab commented 4 years ago

Thanks for the report @peterbo!

If you want to record a User-ID for a visit, decoupled from user recognition logic, one could simply use a custom dimension.

That's a good point :thinking: In general it was on purpose to generate separate visits on each device for a same user, but in retrospect I see what you mean that it has become more like a custom dimension and not as valuable maybe.

I'm not so much into this topic. I suppose it wouldn't help to do something like setVisitorId(hashedUserId.substr(0,16)) (pseudo code)?

Yes this could help you @peterbo if you run this code in JavaScript, and then on the server-side you also get to generate the same Visitor ID. It might be the easier solution in your case. another solution be to get the Visitor ID from your visitors and store it in your DB for each visitor/user, and then set it again when tracking the conversion server-side.

I suppose in general the idea was maybe also that you can see what a user did before logging in and after as part of the same visit?

Yes, it was the idea also, and an advantage of changing the implementation...

What do you think?

Pending tasks/bugs

peterbo commented 4 years ago

Hey @mattab thanks for the feedback!!

Yes this could help you @peterbo if you run this code in JavaScript, and then on the server-side you also get to generate the same Visitor ID. It might be the easier solution in your case. another solution be to get the Visitor ID from your visitors and store it in your DB for each visitor/user, and then set it again when tracking the conversion server-side.

Making it work again is not a problem. Unfortunately, it's not that easy, because we can't execute business logic on the external endpoints. But I can create a plugin that changes recognition logic. That's not really the Problem. It's rather, that a key feature changed and now can't be used natively for these modern and arising use-cases anymore (e.g. cross device / cross API).

I suppose in general the idea was maybe also that you can see what a user did before logging in and after as part of the same visit?

That's a valid point. However, this is an adjacent use-case to all other uses, and, from my understanding, can be achieved easily with simple CustomDimensions. But also in this case, it doesn't make sense (at least to me) not to recognize the unique visitor again.

in the "after" screenshot, the visit is not marked as "Returning" when a same "user id" visits twice. Expected that the Visits log shows a Returning icon and API mark the visit as a returning visitor, when it had a recent visit with the same User ID. Especially that on your "After" screenshot the 2nd visit was less than 30min after the previous one, so we would have expected it to show the returning visitor icon.

That's what I'd have expected as well. Then the feature would also work for cross device. Decoupling from Visitor-ID is not a bad idea per se, but I feel that, at the moment, the feature is drifting in between use cases and not at all easiy to understand for the average user. Perhaps would be good to have a "default" behavior which can be configured towards a use case for advanced users?

mattab commented 4 years ago

@peterbo

Perhaps would be good to have a "default" behavior which can be configured towards a use case for advanced users?

Feel free to create a separate issue with your thoughts for this :+1:

Still pending tasks/bugs as part of this issue

tsteur commented 4 years ago

@mattab what is the benefit of the current userId behaviour of a custom dimension? If there's no clear benefit, I would 100% vote to change behaviour back to original behaviour and no flags on how things work.

peterbo commented 4 years ago

Expected that the Visits log shows a Returning icon and API mark the visit as a returning visitor, when it had a recent visit with the same User ID

I doubt that Matomo Core would be able to handle that (a returning visitor flag for a different visitor ID). E.g. Visitor-Log: A visit is flagged as returning and when you open the Visitor profile, you will only see one visit. This will probably also be the case for visitor based report archiving (being flagged as returning visitor but counted as two unique visitors) -> returning visitor reports will be distorted.

For really decouple visitor ID from user ID and really adding value, probably some core modifications (archiving, visitor log, etc.) would be necessary. This would be part of a new ticket.

So at the moment, in my opinion, rolling back or creating a config setting for a default behaviour would be the best options. What do you guys think?

MichaelRoosz commented 4 years ago

Why revert it? If you want the old behaviour just set the visitor id manually. ... and my first pull request actually had a setting to set the userid behaviour per-site, but you refused it. Honestly I really do not want to go back to applying patches for each Matomo update. I am thinking about forking and continuing the project under a new name. Currently userid is somewhat like a custom dimension, but one that automatically creates new visits. It's not something that can be replaced by simply using a CD.

MichaelRoosz commented 4 years ago

because 3rd party cookies are not GDPR compliant and blocked by most new browsers anyway. AFAIK, the visitor_id is queried in connection with the site-id.

Wrong. For example in my setup Matomo runs in its own subdomain matomo.domain.com and the matomo sites are other subdomains and paths on the same domain. In this setup the 3rd party feature works very well with all browsers and is very useful to connect the different matomo sites (over 50). So 3rd party cookies are in fact working very well. (and I also have invested a lot of time to fix all the bugs in Matomo related to them)

MichaelRoosz commented 4 years ago

For really decouple visitor ID from user ID and really adding value, probably some core modifications (archiving, visitor log, etc.) would be necessary. This would be part of a new ticket

Great, so go ahead and create that patch like I did instead of asking to have the work of others removed and break their setup because of your edge case.

peterbo commented 4 years ago

For example in my setup Matomo runs in its own subdomain matomo.domain.com and the matomo sites are other subdomains and paths on the same domain.

Thats probably not a 3rd party cookie but a wildcard cookie that you setup with the scope *.example.org?

Currently userid is somewhat like a custom dimension, but one that automatically creates new visits. It's not something that can be replaced by simply using a CD

Generally, this would be easily possible by adding new_visit=1 once to a request that also includes a User-ID: '_paq.push(['appendToTrackingUrl', 'new_visit=1']);' - but I'd rather like to solve this for both use cases.

Great, so go ahead and create that patch like I did instead of asking to have the work of others removed and break their setup because of your edge case.

That's why we're here. To discuss options and added value, not blindly execute. Hence, it would be great if you would contribute in the discussion of use cases and how we could create a feature that is good for different use cases and not break 50% with a minor update.

MichaelRoosz commented 4 years ago

Thats probably not a 3rd party cookie but a wildcard cookie that you setup with the scope *.example.org?

Technically yes, but it works using Matomo's "3rd Party Cookie" feature.

Generally, this would be easily possible by adding new_visit=1 once to a request that also includes a User-ID: '_paq.push(['appendToTrackingUrl', 'new_visit=1']);' - but I'd rather like to solve this for both use cases.

If one is using the official Matomo JS API yes, but my 50+ Matomo sites are managed by many different teams, some using their own API, some using Pixels, etc, etc,. It would be a big pain and take a lot of migration time to move to this new way to do it.

That's why we're here. To discuss options and added value, not blindly execute. Hence, it would be great if you would contribute in the discussion of use cases and how we could create a feature that is good for different use cases and not break 50% with a minor update.

Yes, that is why I am here.

Basically having a per-site setting to switch between the two userid behaviors would be totally fine for me. Actually my first pull request had such a setting and even defaulted to the old behavior. I just do not want to be forced to do it the old way.

peterbo commented 4 years ago

It would be a big pain and take a lot of migration time to move to this new way to do it.

Well, that's exactly the situation, I (and probably others) find myself in now. I also service a lot of instances with around 10k Sites. Just a few dozen of them are using the User-ID feature, but all of them rely on a recognition by User ID over visitor ID. So you could imagine the pain and work that has to be done.

Great, so go ahead and create that patch like I did instead of asking to have the work of others removed and break their setup because of your edge case.

Another comment to that statement. This not any edge case but the reason to introduce the User-ID feature in the first place. So generally, it'd be good to keep it stable, especially within minor version updates. But that's something, we already all agree on, so lets look ahead towards the resolution.

I'd be fine to make this a config setting - @tsteur @mattab what do you think about that?

mattab commented 4 years ago

we'll need to think more about it. Might take a while. @peterbo I'm not sure about config setting. it would be better to find the optimal solution that fits most use cases. Maybe we can make (almost) everyone happy with a few tweaks to bring back the usefulness of User ID.

tsteur commented 4 years ago

@mattab

I'm trying to understand the thoughts here. What is to your opinion now the difference between userId and a custom dimension? And why was it changed?

It seems to be 99% of users likely don't use 3rd party cookies and it was made worse for them but maybe I'm missing something.

@MichaelHeerklotz

Currently userid is somewhat like a custom dimension, but one that automatically creates new visits.

I'm actually not sure we're doing that currently, or are you saying it should? Really just trying to understand things here. I don't really understand yet why the current behaviour is better for 3rd party cookies and why it was previously not good. Can any of this behaviour maybe achieved with a plugin?

mattab commented 4 years ago

Wrong. For example in my setup Matomo runs in its own subdomain matomo.domain.com and the matomo sites are other subdomains and paths on the same domain. In this setup the 3rd party feature works very well with all browsers and is very useful to connect the different matomo sites (over 50).

In that case, would you be able to use the setCookieDomain and set the domain to .your-domain.com so the cookie is 1st party yet readable on all subdomains? Then I see that Peter suggests the same and you reply "Technically yes, but it works using Matomo's "3rd Party Cookie" feature." which does not make sense to me? Why use 3rd party cookie if 1st party would work? Probably 3rd party is only needed when you want to do cross-domain analysis, i suppose...

@mattab what is the benefit of the current userId behaviour of a custom dimension? If there's no clear benefit, I would 100% vote to change behaviour back to original behaviour and no flags on how things work.

I guess the benefit is that, a visit on mobile will appear separately from a visit on desktop. Before the change, the interactions across mobile and desktop visits were merged into one. Whether it's a benefit is not clear however... as Peter points out (and a few other people by email) it's complex to update Mobile Apps and other SDKs to set the proper Visitor ID based on the web visit (or as a hash of User ID) etc.

What would a "revert" look like?

Would reverting this be as simple as reverting this PR? https://github.com/matomo-org/matomo/commit/ea5a14bdf8aa9608cdc2ab7d5c8236a5ff1eb3e2

mattab commented 4 years ago

Could we maybe assign this to 3.13.4?

tsteur commented 4 years ago

@mattab there were also few other follow up PRs and also in PHP SDK etc. Not too many I think.

I guess the benefit is that, a visit on mobile will appear separately from a visit on desktop. Before the change, the interactions across mobile and desktop visits were merged into one.

I seriously thought that those merged across devices into one visit (cross device tracking) was the purpose of userId.

Re 3.13.4 depends. Would maybe need to go in a 3.13.5 if needed

mattab commented 4 years ago

I seriously thought that those merged across devices into one visit (cross device tracking) was the purpose of userId.

:+1:

MichaelRoosz commented 4 years ago

Wrong. For example in my setup Matomo runs in its own subdomain matomo.domain.com and the matomo sites are other subdomains and paths on the same domain. In this setup the 3rd party feature works very well with all browsers and is very useful to connect the different matomo sites (over 50).

In that case, would you be able to use the setCookieDomain and set the domain to .your-domain.com so the cookie is 1st party yet readable on all subdomains? Then I see that Peter suggests the same and you reply "Technically yes, but it works using Matomo's "3rd Party Cookie" feature." which does not make sense to me? Why use 3rd party cookie if 1st party would work? Probably 3rd party is only needed when you want to do cross-domain analysis, i suppose...

Different sites use different first party cookies, how would that help? How would that cause different sites to use the same visitor id?

Note: I have deleted some comments I made after this post, because I went too far with them. However, I really would prefer a professional handling of this issue. We could revert the changes and add a setting for it afterwards if you want to fix the issue asap.

Reverting a change that took months to get merged and telling me to use "setCookieDomain" which does not help at all is, let us say... a bit harsh.

MichaelRoosz commented 4 years ago

In any case, we should keep the fix that avoids overwriting the global visitor id (_pk_uid) with the user id. If not, if any site messes up the setUserId() call (for example giving every logged out user the same id), it will break the whole Matomo setup for all sites.

MichaelRoosz commented 4 years ago

A compromise could be to generate the visitor id from the user id, but to have multiple visits for each device. What do you think? @mattab @tsteur @peterbo

However, this still creates the problem, that it basically breaks any per-device tracking. How could one see what was done before / after loggin in or out?

I really feel we should have a setting for this.

peterbo commented 4 years ago

Hi everyone, I'd like to heat up the discussion again for this topic, since a lot of instances can not be updated at the moment. @tsteur did you already make a decision how we should proceed with this issue?

tsteur commented 4 years ago

I have not really any preference as I'm not so much in the topic. But wondering:

By default, I reckon more users would want the userId and visitor linked I suppose and use it for device tracking. Other SDKs would need to implement a similar behaviour. Not sure if that would work though. Alternatively, we could add a setting to the whole thing in the backend instead of in the tracking SDK (might make more sense)

MichaelRoosz commented 4 years ago

I agree with @tsteur .

If we change it in the trackers/sdks this will be more work and maybe a bit confusing for the users, because the behavior will depend on sdk version and not Matomo version. In that cause I would like to configure the setting for the JS/matomo.js SDK with the Matomo backend so that my (dev)users do not have to update their webpages/javascript implementations. I would be willing to create a patch for that feature.

On the other hand, if we change it in the Matomo Backend/Core, it would be less work and easier to understand for the user. In that case I would have less/no work for additional patches.

For me both solutions are okay, as long as I can set the default behavior globally in Matomo Core. So I am happy in any case :)

peterbo commented 4 years ago

+1 for changing in core. Tracker behaviour should not be made more complex in my opinion.

I think that using the userID field without any added value over the Custom Dimensions is still a confusing approach, but restoring the original behavior and being able to activate the alternative functionality of the userId could be a good compromise.

tsteur commented 4 years ago

Sounds good to have an option in the backend.

mattab commented 4 years ago

FYI, to make our FAQ accurate at: https://matomo.org/faq/general/faq_21418/, changed it from:

  • If a [User ID][7] is set, either via setUserId in your favorite SDK or via &uid= in the Tracking API, this User ID will be converted (hashed) into a Visitor ID hexadecimal string. The hashed User ID becomes the Visitor ID. We look first for visits where the log_visit.idvisitor matches this Visitor ID (User ID). If no visit is matched, we look for visits where the log_visit.config_id matches the visitor fingerprint.

to:

  • If a [User ID][7] is set, either via setUserId in your favorite SDK or via &uid= in the Tracking API, then we will look first for visits where the log_visit.idvisitor matches this Visitor ID. If no visit is matched, we look for visits where either the log_visit.user_id matches the User ID, or where log_visit.config_id matches the visitor fingerprint.

according to code in: https://github.com/matomo-org/matomo/blob/3.13.5/core/Tracker/Model.php#L397-L414

peterbo commented 4 years ago

@mattab The documentation does not reflect the current behavior:

If no visit is matched, we look for visits where either the log_visit.user_id matches the User ID, or where log_visit.config_id matches the visitor fingerprint

That's not the case: A new visit is created and gets the same User-ID as the other visitor, exactly like a CustomDimension:

webview-b

Both visits have the same User-ID and are no returning visitors (-> no visitorId or configId recognition). This use-case is an Android app with a webview which was held together with the forced User-ID before the changes.

Is this being worked on? If not, I'll create a PR.

mariusk commented 4 years ago

I just want to add support for @peterbo's arguments here. I've had a long discussion with support from Matomo as well on this, and currently the UserID tracking simply isn't working like most people expect. We have a SPA where we set the UserID explicitly after logging in. When we ask for reports on time usage inside our app inside the Matomo web interface, where we select UserID as the first dimension, it simply does not work. The report is mostly identical with the same report using VisitorID (which we do not set currently at least). And worse, as far as I understand, getting reports grouped by UserID like I've described is not even possible with how it currently works. But maybe somebody here has better ideas, if so I'm all ears (I guess if we control the VisitorID client side we could do something, but that seems kind of counter intuitive, it feels like this is exactly what UserID should be used for).

tsteur commented 4 years ago

@mattab are we changing this in 3.X too maybe? It seems like quite a broken feature now the user ID

fcandi commented 4 years ago

@mattab The documentation does not reflect the current behavior: Both visits have the same User-ID and are no returning visitors (-> no visitorId or configId recognition). This use-case is an Android app with a webview which was held together with the forced User-ID before the changes.

I have the same problem. I have a brand new installation of Matomo. I switched Cookies off completely and want to measure returning visitors with device fingerprint and UserId only.

When I click on the tab "Visits/UserIds" I can see some users logging in several time, because in column "visits" there is "2" instead of "1". When I click "Visits/Overview", total visits and unique visitors are exactly the same = unique visitors are not detected.

For a new Matomo user, this looks like a bug, not a feature ;)

fcandi commented 4 years ago

One more note: I think, detecting unique visitors is maybe the most important thing to make all statistics meaningful, especially for measuring the success of campaigns and goals. This is why I thought: Ok, I don't use cookies, but I put as many information into the tracking code to help Matomo to detect unique visitors. This is, why I included the uid.

Question to @tsteur: you wrote an example to fix it temporarily : setVisitorId(hashedUserId.substr(0,16))

Is there a best practice how to calculate a visitor ID including the userId?

Thx,

Andreas

tsteur commented 4 years ago

@fcandi I haven't tested it but setVisitorId(sha1(userId).substr(0,16)) should basically work as this is what core used to do. In JavaScript you won't have sha1 available unless you add it AFAIK so I reckon any other kind of hashing will work just as well as long as you end up with a 16 character hex value. If you can use sha1() it basically only means that it will end up generating the same visitorId as it used to do a few months ago and therefore the same user would get the same visitorId as before.

fcandi commented 4 years ago

If you can use sha1() it basically only means that it will end up generating the same visitorId as it used to do a few months ago and therefore the same user would get the same visitorId as before.

Thx @tsteur for your explanation. I control the backend so its would be simple to generate the sha1 on the server while loading the user. But I have one more question for understanding:

When I send UserId and VisitorId in the tracking code for logged in users, what happens to new user that are registering during their visit: Does Matomo still show the time before and after registration as one visit? Because this is important for measuring the conversion rate.

tsteur commented 4 years ago

Does Matomo still show the time before and after registration as one visit?

Yes I would say so.

mariusk commented 4 years ago

@fcandi I haven't tested it but setVisitorId(sha1(userId).substr(0,16)) should basically work as this is what core used to do. In JavaScript you won't have sha1 available unless you add it AFAIK so I reckon any other kind of hashing will work just as well as long as you end up with a 16 character hex value. If you can use sha1() it basically only means that it will end up generating the same visitorId as it used to do a few months ago and therefore the same user would get the same visitorId as before.

Any idea how the call to setVisitorId would look? I tried this:

          window._paq.push(['setUserId', email]);
          window._paq.push(['setVisitorId', sha1(email).substr(0, 16)]);

Which did not work and throws an exception about the setVisitorId method not being found inside the javascript client (I included the sha1 function with a library, so that's not the problem).

tsteur commented 4 years ago

Sorry I did not realise this method is only available in development mode (in tests). I was actually planning on using this method myself so I created this PR: https://github.com/matomo-org/matomo/pull/16042

It will be available in the next release. If you need the file earlier you could patch your tracker file: https://raw.githubusercontent.com/matomo-org/matomo/exposesetvisitorid/matomo.js

mariusk commented 4 years ago

Thanks for the clarification. We're currently on a paid hosting plan. But this bug and a couple of other issues (indicating we need to upgrade our plan to change even simple settings) is making me strongly considering self hosting instead.

tsteur commented 4 years ago

Just BTW on our Cloud you won't need to wait for the next Matomo release which might be a month or two away but there you can expect this to be deployed at the latest end next week

mariusk commented 4 years ago

Thanks, useful information, fingers crossed.

tsteur commented 4 years ago

@mariusk just wanted to let you know that you can expect this change to become active on Monday.

mariusk commented 4 years ago

Great, thanks, I'm assuming you mean the code to modify the visitorId should then work. I will try to reactivate it after Monday then, or as soon as I get notice that it should be live.

tsteur commented 4 years ago

Yes the JS tracker that allows you to set the visitorId

tsteur commented 4 years ago

Just fyi we deployed this yesterday @mariusk let me know if it's not working for you and I can follow up

mariusk commented 4 years ago

@tsteur Thanks. I've re-deployed my updated user tracking code and this time around at least it works (doesn't crash the client). Fingers crossed!

mariusk commented 4 years ago

@tsteur Another possible improvement related to this would be to avoid throwing an error when the visitorId or similar doesn't exist. Today we're getting hit by people having the old Matomo client from CDN with our new release which sets visitorId directly as discussed. Since the "function call" is indirect (pushing to a list), I'm not sure wrapping the visitorId setting in a throw/catch should solve that issue, but feel free to enlighten me if you think it should.

peterbo commented 4 years ago

@mariusk Hey Marius, it'd be great if you could clarify this topic via the forum / support ticket, because this is no more related to the ticket and quite a bunch of people are getting notified of new messages here.

mariusk commented 4 years ago

@peterbo Should be fine. Anybody following this ticket and attempting the same workaround will get smacked by the Matomo client throwing an error about the missing function, as reported (and confirmed) earlier. But only until everybody gets the updated Matomo client. I'll leave it for now and fingers crossed, it should all be good within a short time.