Ideate about tracking - Githubissues

kantord commented 4 years ago

This issue only about designing/debating a solution. For the actual implementation, new issues can be created.

In order to improve the application, some behavior tracking might be useful. This would include things like:

acquisition sources
time spent on site
number of visitors (search engines excluded)
activity on the site (events)

The following requirements should be met:

For users who are not logged in, cookies should not be used and the tracking should be anonymous
Any privacy-protection tool or browser setting that is useful for the users should be supported
For logged in users, any data recording that is not necessary should be optional, but cookies can obviously not be avoided in this case
The project doesn't have any stable founding right now, and I also want to minimize any costs, so a solution that would cost (a lot) of money will probably be discarded

We might also create a completely separate deployment for a separate version that has all the tracking features removed from the binary code. This might be achievable with a build tool, maybe there's a webpack plugin that let's you mock files/dependencies with a something that doesn't actually do anything. This might be overkill though, since this isn't exactly the most privacy-sensitive app ever made :smile_cat:

It might be also nice if some aggregated/anonymized data could be released publicly, so that data analysts could analyze it.

decentral1se commented 4 years ago

Hi there :wave:

Could you please further define what "In order to improve the application" means?

I admit that from the general spirit of the project that I've seen so far, it is a bit strange to see an open issue about tracking users of the application. That doesn't speak much to values of user freedom to me personally.

kantord commented 4 years ago

Could you please further define what "In order to improve the application" means?

There are many ways the application could benefit from or even needs feedback from its users.

The simplest and probably most important example is alternative solutions. Since the courses have translation exercises, each sentence might have many different correct translations. In many cases, it's impossible for course authors to think of all the possibilities, so there has to be some feedback on what the most common unaccepted solutions are, so that it can be brought to the course authors' attention.

Now there's no reason we can't make this tracking optional.

First of all, I would implement this separately from other data. For saving course progress, pouchdb is used. That DB can work offline or be optionally synced between multiple devices through a server. Such things are not required IMO for tracking, as it's ok to lose some data. Therefore it doesn't need to be implemented offline. This can be extended to not tracking any users who are not logged in (i. e. not syncing their db). Therefore by being offline, you'd already be completely safe from being tracked.

Second of all, tracking would be optional even for users who are logged in, albeit their course progress has to be saved on a server in that case, so there's no way around that. (Self hosted server remains an option, or maybe an architecture where the user can use their own devices to sync, but that would require some network work most users wouldn't want to do themselves: https://pouchdb.com/guides/replication.html#live%E2%80%93replication)

Third of all, there's way to create separate privacy-focused builds, that not only don't track but also don't make any network requests at all, which is I think as private as it can realistically get? :thinking:

That being said, I don't have anything against my activity on the app being anonymously tracked in ways that are relevant for the improvement for the course content, so I would personally have no reason to use the privacy options, but someone else might.

decentral1se commented 4 years ago

Thanks for sharing thoughts.

There are many ways the application could benefit from or even needs feedback from its users.

Well, you only mentioned one? I appreciate the technical details but I guess I am looking for the social justification on this ticket.

The simplest and probably most important example is alternative solutions. Since the courses have translation exercises, each sentence might have many different correct translations. In many cases, it's impossible for course authors to think of all the possibilities, so there has to be some feedback on what the most common unaccepted solutions are, so that it can be brought to the course authors' attention.

This can be done on a volunteer basis. If users want to share, they can click on some button to share it back.

I am not against tracking per-se, it has uses, but in the case of 1. "i choose to send this information back to the mothership" vs. 2. "i have enabled a feature flag that enables some tracking", I would choose 1. which keeps everyone honest and each use case specific. Otherwise it is so opaque to the user! Tracking in most forms is a slippery slope.

kantord commented 4 years ago

That's a fair point. I think the "volunteer basis" idea, where the user actively chooses to send that information can definitely be implemented, and it can even have UX benefits.

So even if other optional forms of tracking are implemented, it can remain an option to disable automatic tracking but send such information on a one-off, volunteer basis

nfultz commented 4 years ago

I think it's helpful to distinguish between web-tracking data collection (dropping a third party cookie on the client, comparing organic vs paid traffic,dwell times and click through rates, etc) and the necessary data collection for education (eg quiz results, task completion) - I strongly recommend using a synthetic ID (UUID would be fine) as a user ID for all the education data, and not their email address or other PII.

Course developers would benefit from having access to item-level data so they could tweak vocab items that are ambiguous, for example. It would also be critical to have such data for training models for adaptive quizzes or other personalized learning. It seems reasonable enough to hook these features up to a checkbox, though, to allow users to opt out if all they want to do is click through flash cards.

For web-tracking, I'd rather just not do it, I've done enough of that in my day job and would not be interested on working on it for an open source project in my free time.

kantord commented 4 years ago

For now I'm closing this issue: there are a lot more higher prio issues, and there would be no capacity to implement this or analyze the results anyway.

If you have some great ideas, feel free to re-open though

Roshanjossey commented 3 years ago

I'm getting help from Gnome.org to get access to a hosted version of https://matomo.org for my personal project.

Matomo has a docker image that you can self host if that's a possibility.

garrison commented 3 years ago

We at https://wikiotics.org have been using https://plausible.io for our analytics lately. If it would help, we might be able to add you to our account and thus have the Wikiotics Foundation cover the cost as part of our charitable mission. Would you be interested? Our public dashboard is at https://plausible.io/wikiotics.org, if an example is helpful.

kantord / LibreLingo

Ideate about tracking #83