openfisca / legislation-explorer

Explore legislation formulas and parameters.
https://legislation.demo.openfisca.org
GNU Affero General Public License v3.0
26 stars 12 forks source link

Visits not properly tracked in Piwik #163

Closed bonjourmauko closed 6 years ago

bonjourmauko commented 6 years ago

Relates to openfisca/openfisca-core#685

Currently, page visits are not properly tracked in Piwik.

We should ensure that we track/group:

capture d ecran 2018-07-09 a 22 18 34
MattiSG commented 6 years ago

I don't understand what you would like your stats to look like. Are you sure your instance is properly set up and that the problem does lie with the Legislation Explorer itself? 🙂

bonjourmauko commented 6 years ago

Hi @MattiSG,

I'd like them to look like, for example, https://fr.openfisca.org/legislation/cmu_base_ressources.

Are you sure your instance is properly set up

No, good question. I'll investigate and close if it is not the case.

Anna-Livia commented 6 years ago

This is what I gathered from my recent piwik investigations.

  1. The piwik instance (id = 4 in our case) is supposed to represent one website, or one app. It will put that app's url (fr.openfisca.org) in front of any "action url" it receives, if the whole url is not provided. /parameters --> fr.openfisca.org/parameters http://fr.openfisca.org/api/v21/parameters --> http://fr.openfisca.org/api/v21/parameters (theory to be confirmed)

  2. Our piwik trackers are situated inside of apps, and do not have, as far as I can tell, a knowlege of the full url used to access them. Piwik tracker knows the legislation explorer is serving http://localhost:2031/variable/age but doen't know the nginx configuration that routes https://fr.openfisca.org/legislation/variable/age Same thing goes for the API

Solutions : Here is the solutions I can envision, please tell me if you know of more options, of course we could mix-and-match:

A. Have one piwik instance per app.

We would have one idfor each app : One for the international website, one for the french website, one for the legislation explorer and one for each instance of the API. How to do this : ask data.gouv to give us more instances I don't know if you can have cross idreporting, but in the case of the API, aving one id per running instance would make it difficult to have an overview of the api usage.

B. Hard encode our URLs in the piwik tracker

This would be "easy" to do in the API, and maybe in the websites. I don't know how to implement this in the legislation explorer where we use a piwik react library.

How to do this for the api, it would mean to split the action url and only keep the path, and then append that path to a variable, that would be passed on launch (openfisca-serve --serving-url http://fr.openfisca.org/api/V21 ?) , the whole URL could be sent to the piwik server, and hopefully (this has not been tested) it would save the whole path. This would make the API urls appear under one apidirectory in the piwik dashboard. Then you will be able to access all the versions (v21, v22, ...) and in each of these directories, you will fint the specific URLs. I don't think you will be able to link the specific urls across the version directories (to be determined), meaning I don't think you could say that v21/parametersis the same page as v22/parameter.

C. Use events categories and actions

This is, IMHO, the option that seems to have most potential. However, I am not sure again hot to impleùent this in the legislation explorer In this option, we can track not only pages, but customised "events" that are organised in event categoriesand event actions( also, optionally, event nameand event value) This allows us for exemple to trigger a e_c with the country package name and version and a e_a with the path. (Other schemas are possible, this one fits our api version needs best) With events, we can analyze the number of calls to api/v21/parameters vs api/v22/parametersand also get all calls to /parameters (with v21 AND v22)

My opinion

  1. clarify the ask I am not sure what you are trying to measure and why we want to measure it.
  2. Let's try to get more ids (at least for the websites and the legislation explorer) so we can have seperate trackings. Maybe we should host our own piwik ?
  3. Let's use events to track actions across diverging urls
bonjourmauko commented 6 years ago

Our piwik trackers are situated inside of apps, and do not have, as far as I can tell, a knowlege of the full url used to access them. Piwik tracker knows the legislation explorer is serving http://localhost:2031/variable/age but doen't know the nginx configuration that routes https://fr.openfisca.org/legislation/variable/age

AFAIK, that is not passed by nginx. There's a BASENAME value that is passed when starting the server, and that's available both in the server and in the client side:

process.env.BASENAME

Same thing goes for the API

For the API, it is actually passed by nginx. In such a case, we can ask nginx to gives us the proper value (not tested):

location ~(/api/v21)(.*)$ {
  proxy_set_header Host $host$1$2;
}

Have one Piwik instance per app.

As long as we can keep the visitor ID across sites: https://matomo.org/faq/how-to/faq_23654/

Use events categories and actions

I don't have a clear view of the difference between Pages and Events in Piwik. I've manually tracked events before with other software like Mixpanel when :

I am not sure what you are trying to measure and why we want to measure it.

We want pages/events/outlinks properly tracked so we can ~build funnels and cohorts, and improve retention (increase received pull requests)~ make informed decisions with them.

Otherwise, current data is hard to impossible to exploit.

Let's try to get more ids (at least for the websites and the legislation explorer) so we can have seperate trackings.

If we can somehow keep visitor IDs across websites and have an unified dashboard.

Maybe we should host our own piwik ?

Cloud is 7,50 € a month, why not (doing the maths, I wouldn't have hosted before having at least 5 million pageviews a month).

Let's use events to track actions across diverging urls

I think not tracking the proper values is a bug, and adding the events a feature. IMHO they're complementary, not exclusive.