welpo / tabi

A modern Zola theme with search, multilingual support, optional JavaScript, a perfect Lighthouse score, and a focus on accessibility.
https://welpo.github.io/tabi/
MIT License
127 stars 38 forks source link

Privacy-preserving web analytics #172

Closed sandman closed 1 year ago

sandman commented 1 year ago

Feature Request

Summary

Enable free (ideally) privacy-preserving self-hosted web analytics for Tabi site.

Motivation

Web analytics helps to understand site visit patterns and is useful for SEO, tailoring content etc.

Detailed Description

Integrate Plausible as a first option within Tabi. Implementation details are TBD.

Additional Context

N.A.

donovanglover commented 1 year ago

Umami is also an option. I've used it for a number of years.

sandman commented 1 year ago

Yes, multiple options can be integrated and left as a choice for the developer. Here is a fairly recent comparison of choices.

welpo commented 1 year ago

Thanks for the input @sandman and @donovanglover!

I was thinking of adding a simple config option that could be set to the actual line these services tell you to add, like:

extra_script = '<script async src="http://localhost:3000/script.js" data-website-id="68416073-3e4a-4f2b-ae5a-787ea205902a"></script>'

This would immediatelly add support for all (or most?) of these services, but the CSP would also need to be modified to allow this connection.

I'll think about adding native Umami and Plausible support so it's more straightforward.

Jieiku commented 1 year ago

I like Matomo because it supports parsing nginx logs or using javascript to collect the data. Not everyone has access to nginx logs, but if you do then it is a great way to do some basic analytics. One of the benefit to parsing logs instead of loading js is that it is faster for your website, and also it cannot be blocked by a browser plugin.

uBlock origin for example will block the majority of javascript based analytics trackers.

Some of the other tools may also support parsing logs, I have not looked at them yet.

https://github.com/matomo-org/matomo-log-analytics (I was able to do this in a self hosted environment using only the free versions, it does mean hosting your own instance of matomo as well.)

I have actually set this up before, in nginx it looks like this: (I used remote syslog because at the time I had matomo in a different container from the actual website)

/etc/nginx/sites-available/mydomain.conf

    access_log syslog:server=192.168.0.24:50505,facility=local0 matomo;
    error_log  syslog:server=192.168.0.24:50505,facility=local0;

/etc/nginx/nginx.conf

http {
        log_format  matomo  '{"ip": "$remote_addr",'
                            '"host": "$host",'
                            '"path": "$request_uri",'
                            '"status": "$status",'
                            '"referrer": "$http_referer",'
                            '"user_agent": "$http_user_agent",'
                            '"length": $bytes_sent,'
                            '"generation_time_milli": $request_time,'
                            '"date": "$time_iso8601"}';
welpo commented 1 year ago

Thanks for all the valuable input, @sandman, @donovanglover, and @Jieiku!

I've just created the branch feat/analytics that adds initial support for Plausible, Umami, and GoatCounter.

I've only tested non-self-hosted Umami. If any of you would be willing to test the Plausible or GoatCounter integrations—or a self-hosted setup—that would be very much appreciated. The Content Security Policy (CSP) has also been updated to accommodate these services.

To set them up, check out the config.toml comments:

[extra.analytics]
# Specify which analytics service you want to use.
# Supported options: ["goatcounter", "umami", "plausible"]
service = "umami"

# Unique identifier for tracking.
# For GoatCounter, this is the code you choose during signup.
# For Umami, this is the website ID.
# For Plausible, this is the domain name (e.g. "example.com").
# Note: Leave this field empty if you're self-hosting.
id = "yourID"

# Optional: Specify the URL for self-hosted analytics instances.
# For GoatCounter: Base URL like "https://stats.example.com"
# For Umami: Base URL like "https://umami.example.com"
# For Plausible: Base URL like "https://plausible.example.com"
# Leave this field empty if you're using the service's default hosting.
self_hosted_url = ""

Looking forward to your feedback.

welpo commented 1 year ago

I've merged the changes in #193. If anyone encounters a problem with analytics, please open a new issue.

Thanks!