Open LarryZhang opened 6 years ago
@LarryZhang I have added a cron job in my Docker image of Matomo if you are interested : https://github.com/crazy-max/docker-matomo
You don't need cron in Docker containers. Just use cron or systemd timers on your host system that executes something like docker exec -u www-data some-piwik php console
.
To follow Docker best practices, it would be best to have a sidecar container to handle cron like events - https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar.
This is the main reason why I've been maintaining my own Matomo images; the recommended approaches here don't work in a setup like Docker Swarm.
Alternative would be to run a sidecar that hits the URL endpoint at regular intervals to perform archiving. That would work, but Matomo's own website recommends using the URL endpoint only as a last resort:
If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.
@sagebind Here is a swarm compose if you want to take a look : https://github.com/crazy-max/docker-matomo/blob/master/examples/swarm/docker-compose.yml The cron solution uses swarm-cronjob
I agree with @Silvenga regarding the side-car pattern, but I am also interested in swarm implications.
@sagebind is it not possible to use swarm placement constrains to ensure that the cron sidecar is run next to the matomo container?
AFAIK the cron sidecar needs to mount the volume where php code is and also needs network access to the database. I assume that we are talking here about access to that php volume, right? matomo is already recommending to use a volume for the php code.
To me, I would find helpful if this image came with cron installed (but not running), so that the same image can be used for app (like now) and cron sidecar: i.e. php version is the same, etc. An alternative entrypoint can then be used (cron -f) that will run the cron in foreground.
At least this fixes the problem for deployments on a docker host, and maybe swarm if placement constraints can make the containers run side by side.
To me, I would find helpful if this image came with cron installed
Busybox, which is already included in the Alpine variant, ships with a cron daemon.
To this day, the only way in Swarm to pin two containers to the same node is to pin both to a specific node, which is unacceptable for high availability.
The Kubernetes solution is to put both containers in the same pod.
hi i am currently running the apache verison what do i have to do to add cron support ??
adding this did not work:
cron:
image: matomo:fpm
links:
- db
volumes:
- ./config:/var/www/html/config
entrypoint: |
bash -c 'bash -s <<EOF
trap "break;exit" SIGHUP SIGINT SIGTERM
while /bin/true; do
su -s "/bin/bash" -c "/usr/local/bin/php /var/www/html/console core:archive" www-data
sleep 3600
done
EOF'
I would be great if the image would just have crond and an appropriate configuration, that way you could just run crond from the same image in a separate container. This is how many other images do it (e.g. nextcloud).
Having cron baked into the image is an anti-pattern for docker. Having every node in a swarm or ECS cluster trying to run the archive task every 5 minutes would be a nightmare.
The suggestion was to have the crond binary and config in there, but not actually running it. That way you can use the same container with different commands for the cronjob and the actual application
The Alpine variant contains BusyBox with the crond applet. You can mount a cron file into /var/spool/cron/crontabs
and start a container with busybox crond -f
.
Thought this might be worth sharing:
app:
image: matomo:fpm-alpine
restart: always
networks:
- db
- app
volumes:
- app:/var/www/html
tmpfs:
- /var/www/html/tmp:uid=82,gid=82
environment:
MATOMO_DATABASE_HOST: db
MATOMO_DATABASE_ADAPTER: mysql
MATOMO_DATABASE_TABLES_PREFIX: piwik_
MATOMO_DATABASE_USERNAME: ${MYSQL_USER}
MATOMO_DATABASE_PASSWORD: ${MYSQL_PASSWORD}
MATOMO_DATABASE_DBNAME: ${MYSQL_DATABASE}
depends_on:
- db
cron:
image: matomo:fpm-alpine
restart: always
networks:
- db
volumes:
- app:/var/www/html:ro
tmpfs:
- /var/www/html/tmp:uid=82,gid=82
depends_on:
- app
entrypoint: "sh -c 'while true; do php console core:archive --url=https://your.url.here/; sleep 600; done'"
as you can see, a second container is started sharing the same resources (mount, db), but is started with a different entrypoint
. The actual cron
command in the container simply runs once every 10 minutes (hence sleep 600
).
Due to sharing the same mount (which holds configs also) no ENV vars have to be provided, config is read from disk (hence depends_on: [app]
This solution seems to me a proper docker
approach, as you use the same image as where the application is running in, thus no alternative image has to be created/modified. By overriding the entrypoint there still is only 1 deployment (app
) controlling the web application, cron only updates DB entries as it doesn't even have write permissions to /var/www/html (mounted ro
).
This practice is also applied in nextcloud docker-compose example
I'd like to second what @rule88 said.
In Kubernetes, Heroku, or other "container only" environments, users are forced to either maintain their own images (bad for security) or use a hack like this (opaque, difficult to maintain).
Both can be avoided by including a minimal shell script in the image which can be set as the entrypoint
. In practice, all the end user would have to do to properly setup cron is add a service like this:
cron:
image: matomo:apache
volumes:
- matomo:/var/www/html
entrypoint: /cron.sh
I added a service
matomo-fpm-cron:
image: matomo:fpm-alpine
restart: always
network_mode: host
entrypoint: sh -c 'echo "running..." && busybox crond -L /dev/stdout -f && echo "stop"'
volumes:
- matomo:/var/www/html
- ./crontab:/var/spool/cron/crontabs/root:ro
environment:
MATOMO_DATABASE_HOST: ${MATOMO_DATABASE_HOST}
MATOMO_DATABASE_ADAPTER: mysql
MATOMO_DATABASE_TABLES_PREFIX: matomo_
MATOMO_DATABASE_USERNAME: ${MATOMO_DATABASE_USERNAME}
MATOMO_DATABASE_PASSWORD: ${MATOMO_DATABASE_PASSWORD}
MATOMO_DATABASE_DBNAME: ${MATOMO_DATABASE_DBNAME}
Here is the contents of my ./crontab
file: */20 * * * * /usr/local/bin/php /var/www/html/console scheduled-tasks:run
Logs:
running...
crond: crond (busybox 1.33.1) started, log level 8
crond: USER root pid 8 cmd /usr/local/bin/php /var/www/html/console scheduled-tasks:run
INFO [2021-08-24 16:40:00] 8 Starting Scheduled tasks...
INFO [2021-08-24 16:40:00] 8 done
Scheduled Tasks executed
My repo which includes the cron
approach with a while-with-sleep-loop: https://github.com/rule88/matamo/
I tried to use cronjob in container, but it does not work, so I have to use a seperate container to run the cron job, but it seems the tempfs
not work either, and I don't know why must made the volume read only to cron container, so I just made a simple container like below and it works.
version: "3"
services:
matomodb:
image: mariadb
command: --max-allowed-packet=64MB
restart: always
volumes:
- matomodb:/var/lib/mysql
environment:
- MARIADB_ROOT_PASSWORD=password
env_file:
- ./db.env
app:
image: matomo
restart: always
volumes:
- matomo:/var/www/html
environment:
- MATOMO_DATABASE_HOST=matomodb
env_file:
- ./db.env
ports:
- 3780:80
cron:
image: matomo
restart: always
volumes:
- matomo:/var/www/html
depends_on:
- app
entrypoint: "sh -c 'while true; do php console core:archive --url=https://mat.yourdomain.com/; sleep 3600; done'"
volumes:
matomodb:
matomo:
My solution for this is to use ofelia
In my setup I have an extra container in my Docker Compose file which runs the task as www-data
user in the app
container periodically.
docker-compose.yml:
...
ofelia:
image: mcuadros/ofelia:v0.3.6
restart: always
depends_on:
- app
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./ofelia/config.ini:/etc/ofelia/config.ini
...
ofelia/config.ini:
[job-exec "job-archive-reports"]
schedule = @every 1h
container = matomo-app-1
command = /usr/local/bin/php /var/www/html/console core:archive --url=http://your.domain.com
user = www-data
tty = true
In case it helps anyone else, I made a very simple derived cron image so that I can have the schedule in environment variables and don't need to mount the crontab. As we use the Alpine-based image it can simply use the busybox crond. (In addition, I added a wait loop not to run cronjobs on an instance not yet set up.)
Dockerfile:
# start from Matomo official image
FROM matomo:4-fpm-alpine
# add cron entrypoint
ADD entrypoint.cron.sh /
ENTRYPOINT [ "/entrypoint.cron.sh" ]
CMD [ "crond", "-f", "-d6" ]
entrypoint.cron.sh:
#!/bin/sh
# timeout - configured in .env or defaults
[ -n "$WAIT_STEP" ] || WAIT_STEP=5 # seconds
[ -n "$WAIT_MAX" ] || WAIT_MAX=60 # x WAIT_STEP seconds
echo -n "Creating crontab for www-data..."
crontab -u root -r
crontab -u www-data - << EOF
${CRON_ARCHIVE} /var/www/html/console core:archive
${CRON_TASKS} /var/www/html/console scheduled-tasks:run
EOF
echo " done."
# wait for matomo instance to be configured
echo -n "Checking if initialised..."
WAIT_COUNT=0
while [ ! -e config/config.ini.php ]; do
WAIT_COUNT=$((WAIT_COUNT+1))
if [ $WAIT_COUNT -gt $WAIT_MAX ]; then
echo " timeout."
exit 1
fi
echo -n .
sleep $WAIT_STEP
done
echo " up."
exec "$@"
In docker-compose.yaml:
cron:
build: cron
depends_on:
- app
volumes:
- matomo:/var/www/html
environment:
- PHP_MEMORY_LIMIT=2048M
- CRON_ARCHIVE=*/30 * * * *
- CRON_TASKS=*/5 * * * *
init: true
My solution for this is to use ofelia
In my setup I have an extra container in my Docker Compose file which runs the task as
www-data
user in theapp
container periodically.docker-compose.yml:
... ofelia: image: mcuadros/ofelia:v0.3.6 restart: always depends_on: - app volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - ./ofelia/config.ini:/etc/ofelia/config.ini ...
ofelia/config.ini:
[job-exec "job-archive-reports"] schedule = @every 1h container = matomo-app-1 command = /usr/local/bin/php /var/www/html/console core:archive --url=http://your.domain.com user = www-data tty = true
Is the --url= the main matomo web site or is --url= one of your sites that you track?
@plittlefield --url
points to your matomo instance.
Thank you!
In case it helps anyone else, I made a very simple derived cron image so that I can have the schedule in environment variables and don't need to mount the crontab. As we use the Alpine-based image it can simply use the busybox crond. (In addition, I added a wait loop not to run cronjobs on an instance not yet set up.)
Dockerfile:
...
entrypoint.cron.sh:
...
In docker-compose.yaml:
...
Note to self: Remember to set entrypoint.cron.sh
as chmod 775 (else permission error will be thrown)...
Problem I didn't use a volume, but just a simple bind mount. I through that part caused a permission problem, so I moved my matomo-instance to a volume instead.
But it's still throwing: The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
Docker-compose for the cron-part:
cron:
build: cron
depends_on:
- app
volumes:
- app:/var/www/html
environment:
- PHP_MEMORY_LIMIT=2048M
- CRON_ARCHIVE=*/30 * * * *
- CRON_TASKS=*/5 * * * *
init: true
Full log:
Creating crontab for www-data... done.
Checking if initialised... up.
crond: crond (busybox 1.35.0) started, log level 6
crond: USER www-data pid 10 cmd /var/www/html/console core:archive
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
ERROR [2022-12-21 10:19:00] 10 Uncaught exception: /var/www/html/vendor/matomo/doctrine-cache-fork/lib/Doctrine/Common/Cache/FileCache.php(84): The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created. [Query: , CLI mode: 1]
[InvalidArgumentException]
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
If I add ls -alht
inside the entrypoint-script, most files is listed as xfs:xfs
ownership, but is found:
> docker logs -f matomo-cron-1
Creating crontab for www-data...total 396K
drwxr-xr-x 7 xfs xfs 4.0K Dec 21 11:04 tmp
drwxr-xr-x 13 xfs xfs 4.0K Dec 21 11:03 .
drwxr-xr-x 3 xfs xfs 4.0K Dec 21 11:03 config
...
drwxr-xr-x 7 xfs xfs 4.0K Dec 21 11:03 misc
drwxr-xr-x 68 xfs xfs 4.0K Dec 21 11:03 plugins
-rw-r--r-- 1 xfs xfs 578 Dec 8 16:58 DIObject.php
-rwxr-xr-x 1 xfs xfs 753 Dec 8 16:58 console
-rw-r--r-- 1 xfs xfs 64.3K Dec 8 16:58 matomo.js
...
Part of result is removed...
If I bash into the Matomo App container, I can see that SE Linux is in use - at least that's what I think, when I see the dot
in the end of the permissions (That's also the case, if I spin up a fresh Matomo container, using a fresh volume).
root@bca07dfdde3d:/var/www/html# ls -alht
total 396K
drwxr-xr-x. 13 www-data www-data 4.0K Dec 7 23:45 .
drwxr-xr-x. 3 www-data www-data 4.0K Dec 7 23:45 config
...
drwxr-xr-x. 2 www-data www-data 4.0K Dec 7 23:45 tmp
drwxr-xr-x. 22 www-data www-data 4.0K Dec 7 23:45 vendor
-rw-r--r--. 1 www-data www-data 770 Dec 7 23:45 robots.txt
...
-rw-r--r--. 1 www-data www-data 6.3K Dec 7 23:45 offline-service-worker.js
-rw-r--r--. 1 www-data www-data 65K Dec 7 23:45 piwik.js
-rw-r--r--. 1 www-data www-data 2.7K Dec 7 23:45 piwik.php
drwxr-xr-x. 1 root root 4.0K Dec 6 05:16 ..
So, how have you fixed that part, yourself?
Do you have any suggestions on how to fix this, @ctueck ? :)
@exetico so it seems the cron container can't create that directory because all files are owned by xfs, while the cron script runs as www-data. Do you have any idea why the file ownership differs between the two containers?
In our case, directory listings are identical:
colin@appsrv:/opt/matomo$ docker-compose exec app ls -la /var/www/html
total 396
drwxr-xr-x 13 www-data www-data 4096 Oct 5 12:49 .
drwxr-xr-x 1 root root 4096 Oct 6 23:21 ..
... ...
drwxr-xr-x 10 www-data www-data 4096 Oct 5 12:49 tmp
drwxr-xr-x 22 www-data www-data 4096 Oct 5 12:49 vendor
colin@appsrv:/opt/matomo$ docker-compose exec cron ls -la /var/www/html
total 396
drwxr-xr-x 13 www-data www-data 4096 Oct 5 12:49 .
drwxr-xr-x 1 root root 4096 Aug 9 21:20 ..
... ...
drwxr-xr-x 10 www-data www-data 4096 Oct 5 12:49 tmp
drwxr-xr-x 22 www-data www-data 4096 Oct 5 12:49 vendor
@ctueck I'm not really sure. Here's a fresh Matomo instance on another system, executed in docker (Host system is Manjaro).
~/D/T/TestMatom docker-compose up -d 2022-12-21T12:24:12 UTC
Building cron
Step 1/4 : FROM matomo:4-fpm-alpine
---> 341d01ee881b
Step 2/4 : ADD entrypoint.cron.sh /
---> e70050d26bc7
Step 3/4 : ENTRYPOINT [ "/entrypoint.cron.sh" ]
---> Running in 32c8808a8bf0
Removing intermediate container 32c8808a8bf0
---> f5aadc2c6b8d
Step 4/4 : CMD [ "crond", "-f", "-d6" ]
---> Running in 5ed9d6a3c8c6
Removing intermediate container 5ed9d6a3c8c6
---> 7243c521bb47
Successfully built 7243c521bb47
Successfully tagged testmatom_cron:latest
WARNING: Image for service cron was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
testmatom_app_1 is up-to-date
testmatom_db_1 is up-to-date
Creating testmatom_cron_1 ... done
~/D/T/TestMatom docker logs -f testmatom_cron_1 3792ms 2022-12-21T12:24:20 UTC
Creating crontab for www-data...total 400K
drwxr-xr-x 13 xfs xfs 4.0K Dec 7 23:45 .
...
drwxr-xr-x 2 xfs xfs 4.0K Dec 7 23:45 tests
drwxr-xr-x 2 xfs xfs 4.0K Dec 7 23:45 tmp
...
-rw-r--r-- 1 xfs xfs 5.8K Dec 7 23:45 README.md
-rw-r--r-- 1 xfs xfs 1.8K Dec 7 23:45 SECURITY.md
...
-rw-r--r-- 1 xfs xfs 2.6K Dec 7 23:45 piwik.php
drwxr-xr-x 1 root root 4.0K Nov 30 21:39 ..
done.
Checking if initialised... up.
crond: crond (busybox 1.35.0) started, log level 6
crond: USER www-data pid 11 cmd /var/www/html/console core:archive
crond: USER www-data pid 12 cmd /var/www/html/console scheduled-tasks:run
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
ERROR [2022-12-21 12:40:00] 12 Uncaught exception: /var/www/html/vendor/matomo/doctrine-cache-fork/lib/Doctrine/Common/Cache/FileCache.php(84): The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created. [Query: , CLI mode: 1]
ERROR [2022-12-21 12:40:00] 11 Uncaught exception: /var/www/html/vendor/matomo/doctrine-cache-fork/lib/Doctrine/Common/Cache/FileCache.php(84): The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created. [Query: , CLI mode: 1]
[InvalidArgumentException]
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
[InvalidArgumentException]
The directory "/var/www/html/tmp/cache/tracker/" does not exist and could not be created.
It's the same result. I'm on image: matomo
(:latest) ...
Also, just to clear any doubts. Here's the result of the direct ls -la
. I'm not sure why I didn't think of that while testing.
~/D/T/TestMatom docker exec -it testmatom_cron_1 ls -la /var/www/html 2022-12-21T12:41:02 UTC
total 400
drwxr-xr-x 13 xfs xfs 4096 Dec 21 12:33 .
drwxr-xr-x 1 root root 4096 Nov 30 21:39 ..
-rw-r--r-- 1 xfs xfs 101854 Dec 7 23:45 CHANGELOG.md
-rw-r--r-- 1 xfs xfs 929 Dec 7 23:45 CONTRIBUTING.md
...
drwxr-xr-x 2 xfs xfs 4096 Dec 7 23:45 tests
drwxr-xr-x 9 xfs xfs 4096 Dec 21 12:34 tmp
drwxr-xr-x 22 xfs xfs 4096 Dec 21 12:33 vendor
Test 2 files:
~/D/T/TestMatom ls -alht 2022-12-21T13:31:46 UTC
total 28K
-rw-r--r-- 1 exetico exetico 914 Dec 21 13:39 docker-compose.yaml
drwxr-xr-x 4 exetico exetico 4,0K Dec 21 13:38 ./
drwxr-xr-x 6 999 adm 4,0K Dec 21 13:32 matomo-db/
-rw-r--r-- 1 exetico exetico 95 Dec 21 13:32 .env
drwxr-xr-x 2 exetico exetico 4,0K Dec 21 13:20 cron/
drwxr-xr-x 19 exetico exetico 4,0K Dec 21 13:18 ../
~/D/T/TestMatom ls -alht cron 2022-12-21T13:31:48 UTC
total 16K
drwxr-xr-x 4 exetico exetico 4,0K Dec 21 13:38 ../
drwxr-xr-x 2 exetico exetico 4,0K Dec 21 13:20 ./
-rwxrwxr-x 1 exetico exetico 721 Dec 21 13:20 entrypoint.cron.sh*
-rw-r--r-- 1 exetico exetico 176 Dec 21 13:20 Dockerfile
~/D/T/TestMatom
It's the same result. I'm on
image: matomo
(:latest) ...
I tried and couldn't reproduce it, but I think now I see why: this is a different image variant than the cron container is based on.
Yours would be the default image, while we run the Alpine-based variant of the PHP-FPM image, see FROM matomo:4-fpm-alpine
in the Dockerfile. I realise I didn't quote our complete docker-compose.yml, but the app container has image: matomo:4-fpm-alpine
consistently.
This leads to two important points:
latest
) images are Debian- or Ubuntu-based I think, so in that case you might also need to adapt the Dockerfile to actually install a suitable crond.If you're using K8s and you want to run the cron job you may use this example.
In this yaml file, my pod mounts a volume in /var/www/html and my Matomo site is https://example.com/matomo. Security context is set for user id 33 (www-data)
apiVersion: batch/v1
kind: CronJob
metadata:
name: matomo-cron
spec:
schedule: "5 * * * *"
jobTemplate:
spec:
template:
spec:
securityContext:
runAsUser: 33
containers:
- name: matomo-cron
image: matomo:latest
imagePullPolicy: IfNotPresent
command:
- php
- /var/www/html/console
- core:archive
- --url=https://example.com/matomo
volumeMounts:
- mountPath: /var/www/html
name: matomov
restartPolicy: OnFailure
volumes:
- name: matomov
persistentVolumeClaim:
claimName: matomo-claim
Hi @ctueck
After the Matomo instance has shown that it's stable, I've now secured the proper backup. Therefore I also wanted to re-visit this.
I've moved to the fpm-alpine
image instead, and everything works as expected. Thank you for answering my question :smile: ! I'm sorry that I originally missed the fact that you used the smaller alpine image.
It's quite bad-practice, but for the sake of simplicity I've built a crontab on my docker host (unraid), which runs every hour at :41: `41 * docker exec -d matomo /bin/bash /opt/bitnami/matomo/misc/archive-script-cron.sh`
The script is also quite simple:
/opt/bitnami/php/bin/php /opt/bitnami/matomo/console core:archive --url=https://matomo.myurl.com/ >> /opt/bitnami/matomo/misc/archive.log
(Historically I've been using the bitnami container - but it should be quite straight forward for the official container)
@doncicuto Thanks for sharing your configuration for K8s. I guess you also have another pod that runs the matomo deployment at the same time as this cron job can run. I wonder how you don't have an issue with the underlying persistent volume, that needs to be used by both pods at the same time. I am having issues with not being able to mount the volume when it is already mounted(using Azure Disks). Would you care to share your set up for the persistent volume and the main matomo pod?
Thanks in advance!
Hi @leokolezhuk, I used this awesome post https://kalfeher.com/goodbye-google-hello-matomo/ to inspire my configuration, have a look it may help you.
@doncicuto Thanks for sharing your configuration for K8s. I guess you also have another pod that runs the matomo deployment at the same time as this cron job can run. I wonder how you don't have an issue with the underlying persistent volume, that needs to be used by both pods at the same time. I am having issues with not being able to mount the volume when it is already mounted(using Azure Disks). Would you care to share your set up for the persistent volume and the main matomo pod?
Thanks in advance!
Use NFS (NFS file shares in Azure Files)
I agree with the sentiment that there should be an included script to run the docker container in "archiver mode" so the normal configuration for the archiver is already included in the docker image and can be called via an entrypoint.
In Kubernetes you can configure the matomo pod shareProcessNamespace=true
and a sidecar container to run the jobs.
The sidecar command would look like this(in Terraform)
command = [
"bash",
"-c",
<<-EOF
#requires share_process_namespace=true
while /bin/true; do
echo "sleeping..."
sleep 60
PID=$(pgrep apache2 | head -1)
cd /proc/$PID/root/var/www/html
su www-data -s /bin/bash -c "./console scheduled-tasks:run"
su www-data -s /bin/bash -c "./console tagmanager:regenerate-released-containers"
done
EOF
]
My complete Terraform example is here https://github.com/mingfang/terraform-k8s-modules/blob/master/modules/matomo/main.tf#L83
So in default, unmodified Matomo Docker container setup, I have to reload the page in order to trigger scheduled tasks?
better to add the cron job