matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.61k stars 2.62k forks source link

Offline tracking in JavaScript API #9939

Closed PCSun1987 closed 3 years ago

PCSun1987 commented 8 years ago

Current it's not possible to send tracking event offline.

So one idea would be to extend PIWIK tracking API especially for JavaScript with additional parameter about even time-stamp (optional), so inside JS, you can keep the event locally and once connecting to internet, send all the events with the event happening time.

For basic tracking, probably can do similar way.

https://forum.piwik.org/t/does-piwik-work-even-your-offline/7295/9

hpvd commented 8 years ago

+1 would be a good step on way to universal tracker (Piwik 3.0). It's not only intersting for apps which have temporaly no connection to web, but also a good thing to make the manual input of "real world" events after they have happened possible and give them the correct place in timeline

tsteur commented 8 years ago

This would be a nice feature indeed

PCSun1987 commented 8 years ago

So...how long we would plan to have this functionality? E.g. 3.0 would be released when?

tsteur commented 8 years ago

Piwik 3.0 would be in about a year but this feature is not planned yet. Pull request or suggestions on how to implement it are always welcome :+1:

ghost commented 7 years ago

Just looking for such a solution. This is tremendously important for mobile apps.

In the JS tracking client, I have noticed a method called retryMissedPluginCalls() and the array missedPluginTrackerCalls. It could be interesting to hook it in some way in order to intercept the calls to the server in an offline state. Then, upon "online" event, we would call and retry missed calls.

Your thoughts, @tsteur and @mattab ?

tsteur commented 7 years ago

retryMissedPluginCalls() is actually a bit different here. Plugins can extend the Piwik JS tracker and there may be cases where either Piwik is loaded first, or the Plugin. If Piwik was loaded first and tries to apply all _paq.push calls, it cannot call the methods for the plugins yet as they are not yet loaded. Therefore once the plugin is loaded they try to call all missed plugin calls again.

Offline tracking is super important nowadays for mobile apps, progressive web apps, .... If someone wanted to work on it I'm happy to give some support. I think it needs to be worked out what the best place is to save requests that couldn't be sent because the user is offline (eg localstorage, ...) and then we need to detect whether user is offline and when user goes online again. Some browsers have an API for that.

There might be one problem when tracking the requests later. I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token. The time is customizable AFAIK but it might be something to take into account that requests older than 4 hours may have to be invalidated.

ghost commented 7 years ago

Thank you for your prompt reply.

I need to find some solution for this rather promptly, whether using Piwik or something else. Ideally the former though, due to its ability to easily deal with hybrid applications. ¬

retryMissedPluginCalls()

OK, I see. No problem. ¬

I think it needs to be worked out what the best place is to save requests that couldn't be sent

Well, I don't think that this is so important. It should be left open to the developer to decide what to plug into the system. Everyone might have different preferences. ¬

then we need to detect whether user is offline and when user goes online again

Ditto. Developer should supply this to make it easy for Piwik. Piwik should only supply the mechanism for giving up the "send" and for having the ability to submit it later. This is what I need to find out now.

What I mean by this is that there should be methods to call in Piwik that let it know that "now it is necessary stop sending tracking data and save it instead" and "now resume sending and send what has been stored".

By the way, the "stop sending" and "resume sending" functionality is already working now.

So that part is simple.

Even calling the JS tracking client (or loading it locally) is easy to solve upon detection of online/offline events. I already have this part solved.

Thus the only remaining thing is "how to store the data, so that Piwik would know when those tracking event happened, so that it would be possible to reconstruct the past sequence correctly upon delayed sending".

I need your or Matthieu's input on this, as you understand the existing code base and its functionality. (Thank you in advance.) ¬

I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token.

I don't understand this one. What kind of a token? When you are submitting old data, you are still submitting it with the session that is currently active, aren't you?

The point is that the device might be offline for one week or a month. This means that we just need to keep storing the tracking data with original timestamps and simply submit it in a sequence, when the device comes online.

This means that we won't have such data available for analysis immediately, but only eventually – yet it is still important and better than not having it at all.

The question is to what extent the current system could support this scenario without major code changes. Is it possible to hook some existing sub-systems?

ghost commented 7 years ago

Updated the previous comment. --

ghost commented 7 years ago

@tsteur This is actually an interesting notion that you mentioned... about the plugins for JS tracking client.

It is possible to write such an "offline tracking" solution as a plugin?

If yes, I could look into this ASAP, if I am given a guidance on how such plugins are written. Thanks.

tsteur commented 7 years ago

With token I mean you need an authentication token in this case. It is actually hard coded that when you want to track a request that is older than 4 hours you need to authenticate see https://github.com/piwik/piwik/blob/2.17.0/core/Tracker/Request.php#L467-L474 . This is for some security reasons eg you could otherwise track into any Piwik instance data in the past etc. This token can be disabled though here https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

I feel like we could maybe add an API to the tracker like setUserOffline() in which we overwrite the internal method sendRequest to add such requests to an array instead. A developer could eg pass some kind of storage class for us to add the request like {addRequest: function (request) {}} and the developer could this way decide what to do with it.

When user becomes online, the developer could call eg a method tracker.setUserOnline(storedRequests) and we (Piwik tracker) would try to re-send these requests in bulk. However, there is this 4 hour problem as described currently.

It could be probably written as a plugin, but this API is not yet official and is undocumented and we would for sure need to add some methods to the tracker. Adding those methods to tracker could be done quickly though. I'll show you rough idea in a bit without thinking too much about it

tsteur commented 7 years ago

This could be rough idea: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) {
    // eg localstorage.addItem(request);
}})
tracker.setUserOnline(localstorage.getItems()})
tsteur commented 7 years ago

Eventually Piwik would ideally detect offline status itself and store it somewhere.

The biggest problem remains the Piwik backend re the 4 hours in past only

ghost commented 7 years ago

Thank you. The code looks reasonable to me. I would just change the initialisation of the configOfflineStorage to an object instead of an array: configOfflineStorage = {}; on the following line: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3085

tsteur commented 7 years ago

I made in an array because an array has out of the box a push method. This way it will be easy for us to add tests for it. A developer would for now set a custom offline storage that is an object with a push method.

I renamed the user term to visitor as Piwik usually uses the term Piwik. Do you think you could work with something like this? Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

@mattab do you have any thoughts on this?

ghost commented 7 years ago

array, push method

OK, I understand your point. ¬

I renamed the user term to visitor as Piwik usually uses the term.

Makes sense. ¬

Do you think you could work with something like this?

Absolutely. Looks very good to me. Simple and effective. ¬

Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

Cordova/Tizen + plain JS + HTML + CSS. Multiplatform (Android, Amazon, AmigoOS, Blackberry10, iOS, Tizen, Windows). ¬

The biggest problem remains the Piwik backend re the 4 hours in past only

I have just cloned the Piwik repository and I am going to look into the reasoning for this limitation...

ghost commented 7 years ago

@tsteur

I have just found this: https://github.com/piwik/piwik/blob/3.x-dev/core/Tracker.php#L256-L260

It looks like the bulk submission automatically bypasses the authentification.

Am I right in the assumption that it applies to our case?

If yes, the whole problem will have been solved tonight. ;-)

ghost commented 7 years ago

If not, what is the "bulk request" then?

ghost commented 7 years ago

Oops! I have noticed only now that those lines are within the setTestEnvironment() function.

ghost commented 7 years ago

However, I have found this:

; Whether Bulk tracking requests to the Tracking API requires the token_auth to be set.
bulk_requests_require_authentication = 0

https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L684-L685

Could this fit our needs?

ghost commented 7 years ago

@tsteur Could it be that you have wrong time on your computer? Or something like that? Have a look here: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1 Your today's commits during our communication on Nov 4, 2016 appear to be made on Nov 2, 2016, i.e. two days ago(!). That's odd.

ghost commented 7 years ago

On top of that, you are changing the version of JS Tracking Client that lacks some code related to configIdPageView, which is present in Piwik 2.17.0.

Thus you have effectively overwritten its declaration on the line 3084.

ghost commented 7 years ago

Also, the semicolon at the end of the line 3084 should be a comma, as the declaration of variables continues on the next line: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3084

tsteur commented 7 years ago

Yeah the time in my virtual machine sometimes gets wrong :)

The config you mentioned only applies to bulk requests in general. Not to the recording records in past. For this tracking_requests_require_authentication would need to be set to "1" see https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

ghost commented 7 years ago

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

Yes, I realised this. Therefore it looks like that 3.0-dev wasn't updated to the latest 2.17 or 2.x-dev. ¬

The config you mentioned only applies to bulk requests in general. Not to the recording records in past.

Oh, that's a pity... :-\

mattab commented 7 years ago

This could be rough idea: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) { // eg localstorage.addItem(request); }})

tracker.setUserOnline(localstorage.getItems()})

This looks great @tsteur ! I'd vote for inclusion in Piwik 3 as a rather powerful new feature once tested & documented. Will help tons of people and make Piwik more resilient!

ghost commented 7 years ago

Hello Matthieu, @mattab

Thank you for your input into this topic.

What are your thoughts about the 4 hour limit of the backend authentication?

tsteur commented 7 years ago

The problem is indeed it is not really useful with the 4 hour limit. And I think even good documentation might not help here as much. Would definitely need to mention it in the docs. On top we could set a timestamp with each request and by default only replay tracking requests of eg last 3 hours (because clocks are not always right it we should go save). Then people could maybe have an option to ignore that 3-4 hours limit.

The scary part is, when a date is set more than 4 hours back, Piwik simply uses the current date instead of not tracking that request at all which is quite dangerous and can lead to wrong tracking data. So it is a big problem

ghost commented 7 years ago

Well, in some way this needs to be resolved.

We can't talk or think just in hours, when discussing offline tracking. The device can be offline for a day, a week or a month, yet the app and Piwik JS Tracking Client need to be able to collect data before eventually submitting it over Internet.

Therefore the data to be processed can be rather old.

tsteur commented 7 years ago

I'm not sure but the next problem might be even that Piwik won't re-archive / re-generate / update reports when there are visitors recorded for eg 2 or 3 days in the past. So even when the visitors are recorded in the past successfully, they might not be visible in the reports. So right now it might actually not make sense to merge it as Piwik is just not ready yet for offline. @mattab or would the reports be re-archived when we record visits in the past and they have been finished already?

mattab commented 7 years ago

@mattab or would the reports be re-archived when we record visits in the past and they have been finished already?

yes, it should be implemented already: when tracking in the past, the old reports should be marked as "needs to be invalidated" and the reports should be invalidated on the next archiving run. Done in Tracker API via rememberToInvalidateArchivedReportsLater in: https://github.com/piwik/piwik/blob/2.x-dev/core/Tracker/Visit.php#L587-L589

and even should be tested in here: https://github.com/piwik/piwik/blob/2.x-dev/tests/PHPUnit/Integration/Tracker/VisitTest.php#L370-L410

tsteur commented 7 years ago

See comments above, tracking offline data would still not really be useable.

mattab commented 7 years ago

fyi: when the feature "custom request datetime" was launched we made the feature require token_auth for all datetime - then in #6407 #6110 it was changed to allow recent requests 4 hours to be tracked.

-> Would you say it would be good enough to track data for the 24 hours?

as you mention, created issue for "skip request with invalid token_auth" #10890

Edit: some code has been written!

Thomas wrote a bit of unfinished code here which may be useful https://github.com/piwik/piwik/commit/59de0ad1c7637c320c3d31050acbd16c16805892

ghost commented 7 years ago

From the technical point of view, the past should be of arbitrary length.

juliusstoerrle commented 5 years ago

Would like to bump this up! Is there still work required on the server side? Was any work already done on the JS client besides https://github.com/matomo-org/matomo/commit/59de0ad1c7637c320c3d31050acbd16c16805892?

tsteur commented 5 years ago

There shouldn't be any work required server side anymore.

BaronBonet commented 5 years ago

Would also like to bump this up. What work needs to be done on the JS end? I could take a crack at it.

tsteur commented 5 years ago

I suppose you would need to store the requests eg in a local storage or so and replay them at a later time when the internet connection is back. Not sure if much else needs to be done.

bdurrer commented 5 years ago

bugsnag.com (a bug/log tracking solution) does exactly this to report problems which occured when offline. This is a must have feature for PWAs, but I guess we could easily implement it ourself using the localStorage and the Tracking API. Would be nice when the JS client already had these capabilities

tsteur commented 5 years ago

Totally agreed @bdurrer it's a must have nowadays. @mattab could maybe schedule it for Matomo 4?

@bdurrer or if anyone else could contribute to this we're happy to help.

saifeer commented 4 years ago

@tsteur, @mattab, IMHO offline storage should be added using service workers (SW). This is the most logical way forward for adding any offline capability. What should be done is:

Also, the SW should forward all requests irrespective of the time passed since caching the event to the server (in the correct sequence of course). The server can then decide if it wants to accept the request and process it or reject the request and discard the data. This way, in the future, if the server code was changed to allow events older than 4h, it would be transparent to the client.

Lastly, the SW approach means that the JS client itself is completely oblivious to the SW and works regardless of connectivity. The SW is API agnostic as it only saves request without manipulation and this way the offline capability is completely decoupled from the JS client and server.

Its worth noting that GA also uses a similar mechanism to add offline capability to their framework.

tsteur commented 4 years ago

Thanks @saifeer very much appreciated 👍

Maqsyo commented 4 years ago

hope this feature will exists someday

mattab commented 4 years ago

When we implement this, how will we deal with the fact that currently Tracking API requests are only allowed up to 24 hours in the past? Refs the setting in config.ini.php under [General] (default is 1 day):

[General]
tracking_requests_require_authentication_when_custom_timestamp_newer_than = 86400;

will it be "acceptable" to drop requests made more than 1 day ago by default, and explain users they can increase the setting in the config? or we'd make this a UI general or per-site setting, or some other solution?

tsteur commented 4 years ago

Yes, the idea is that this is acceptable and it's always possible to change it.

bdurrer commented 4 years ago

Learning from others, you'll want to introduce an maximum amount of held back messages so it does not flood local storage or post big payloads

tsteur commented 4 years ago

BTW did we maybe also want a new dimension whether a request was executed online vs offline? I reckon might be separate feature but to be seen. Might create issue for it later.

tsteur commented 4 years ago

FYI started some early concept using service worker in https://github.com/matomo-org/matomo/pull/15970/files

Looks like this could work even for bulk requests and if send beacon is used.

It's not too much tested though and haven't tweaked code much and haven't tested it in any browser but chromium. Development is quite early stages.

If someone's proficient in service workers and IndexedDB feel free to leave some comments.

The goal will be to cache the actual JS tracker file, and put all tracking requests in a queue should the user be offline.

tsteur commented 4 years ago

@PCSun1987 @juliusstoerrle @bdurrer @saifeer @Maqsyo

Anyone maybe able to have a look at https://github.com/matomo-org/matomo/pull/15970 or even give it a test? I don't have any service workers in use so not sure if there's something that would cause issues with other service workers etc. Have only done some basic testing so far but if any possible be great to test it.

sgiehl commented 3 years ago

@tsteur Anything left here for Matomo 4?

tsteur commented 3 years ago

@sgiehl it's not finished (maybe not even really working yet). We'll try to get some feedback and try to get people to test it so we can see if/how it works and what adjustments it needs, etc.

We'll hopefully have another beta soon and then get some feedback.

Anyone seeing this issue, the feature should be included Matomo 4 beta 3 and newer. Be great to give it a try and comment here if it worked or not and what issues you ran into or what was not clear.