xirixiz / dsmr-reader-docker

DSMR Reader in Docker.
https://hub.docker.com/r/xirixiz/dsmr-reader-docker
113 stars 33 forks source link

Constant write by dsmr_remote_datalogger #299

Closed dofl closed 1 year ago

dofl commented 1 year ago

Support guidelines

I've found an issue and checked that ...

Description

I'm seeing a constant 10/20 K/s data write by the process 's6-supervise dsmr_remote_datalogger' process and I cannot find out why.

Expected behaviour

With debug options enabled I'm seeing the correct behavior in the logs of DSMR: it's waiting for 10 seconds until a new telegram is received. I cannot find out why this process is writing constantly.

Actual behaviour

A constant stream of data being written. Although I cannot find out which files it are, following the pipe does not lead to any useful file path. It's not the database, it's not the postgres process

> ls -l /proc/14087/fd
> total 0
> lr-x------ 1 root root 64 Oct 12 16:22 0 -> /dev/null
> l-wx------ 1 root root 64 Oct 12 16:22 1 -> 'pipe:[13751701]'
> l-wx------ 1 root root 64 Oct 12 16:22 2 -> 'pipe:[13751702]'
> l-wx------ 1 root root 64 Oct 12 16:22 3 -> /run/s6/services/dsmr_remote_datalogger/supervise/lock
> lr-x------ 1 root root 64 Oct 12 16:22 4 -> /run/s6/services/dsmr_remote_datalogger/supervise/control
> l-wx------ 1 root root 64 Oct 12 16:22 5 -> /run/s6/services/dsmr_remote_datalogger/supervise/control
> lrwx------ 1 root root 64 Oct 12 16:22 6 -> 'anon_inode:[signalfd]'

Steps to reproduce

Running the DSMR docker. After that run 'sudo htop'.

dsmr

Docker info

ash-4.4# docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 16
  Running: 15
  Paused: 0
  Stopped: 1
 Images: 17
 Server Version: 20.10.3
 Storage Driver: btrfs
  Build Version: Btrfs v4.0
  Library Version: 101
 Logging Driver: db
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs db fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3fa00912415f3e9c6f82dd72119179d599efd13b
 runc version: 31cc25f16f5eba4d0f53e35374532873744f4b31
 init version: ed96d00 (expected: de40ad0)
 Security Options:
  apparmor
 Kernel Version: 4.4.180+
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 9.576GiB
 Name: NAS
 ID: W4E7:KCGL:NVRA:CN4U:IFXV:HFBR:K6AR:AG43:CSHJ:IXZA:UM7A:V6W5
 Docker Root Dir: /volume1/@docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No kernel memory TCP limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No blkio weight support
WARNING: No blkio weight_device support
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Version

Docker compose

services:
  dsmr:
    cap_add:
    - NET_ADMIN
    container_name: dsmr
    depends_on:
      dsmrdb:
        condition: service_started
    devices:
    - /dev/ttyUSB0:/dev/ttyUSB0
    environment:
      DGID: '1000'
      DJANGO_TIME_ZONE: Europe/Amsterdam
      DUID: '1000'
      TZ: Europe/Amsterdam
      VACUUM_DB_ON_STARTUP: "true"
      VIRTUAL_HOST: localhost
    healthcheck:
      interval: 5m
      retries: 10
      test:
      - CMD
      - curl
      - -Lsf
      - http://127.0.0.1/about
      - -o
      - /dev/null
      - -w
      - HTTP_%{http_code}
      timeout: 5s
    image: xirixiz/dsmr-reader-docker:latest
    links:
    - dsmrdb
    logging:
      driver: db
    ports:
    - published: 7778
      target: 80
    - published: 7779
      target: 443
    restart: always
    volumes:
    - /volume1/docker/DSMR/dsmr_backups/backups:/opt/backups:rw
  dsmrdb:
    container_name: dsmrdb
    environment:
      PG_TZ: Europe/Amsterdam
      POSTGRES_DB: dsmrreader
      POSTGRES_PASSWORD: dsmrreader
      POSTGRES_USER: dsmrreader
      TZ: Europe/Amsterdam
    image: postgres:13-alpine
    logging:
      driver: none
    restart: always
    volumes:
    - /volume1/docker/Ramdisk/DSMR/dsmrdb:/var/lib/postgresql/data:rw
version: '3'
volumes:
  dsmrdb: {}
  dsmrdb_backups: {}

Container logs

2022-10-12 16:39:20,683 DEBUG    mixins       run_loop                         79 | dsmr_datalogger.management.commands.dsmr_datalogger: Sleeping 10.0s
2022-10-12 16:39:30,693 INFO     dsmr_datalogger_api_client read_telegram                    31 | [2022-10-12 16:39:30.693784] Opening connection "/dev/ttyUSB0" using options: {'baudrate': 115200, 'bytesize': 8, 'parity': 'N', 'stopbits': 1, 'xonxoff': 1, 'rtscts': 0}
2022-10-12 16:39:31,029 DEBUG    dsmr_datalogger_api_client read_telegram                    58 | [2022-10-12 16:39:31.029196] Read 1 Byte(s)
2022-10-12 16:39:31,363 DEBUG    dsmr_datalogger_api_client read_telegram                    58 | [2022-10-12 16:39:31.363204] Read 945 Byte(s)
2022-10-12 16:39:31,370 DEBUG    datalogger   telegram_to_reading              81 | Received telegram:
<telgram data>
2022-10-12 16:39:31,401 DEBUG    messages     queue_message                    47 | MQTT: Queued message for dsmr/reading/electricity_currently_delivered: 4.713
2022-10-12 16:39:31,405 DEBUG    messages     queue_message                    47 | MQTT: Queued message for dsmr/reading/electricity_currently_returned: 0.000
2022-10-12 16:39:31,411 DEBUG    mixins       run_loop                         79 | dsmr_datalogger.management.commands.dsmr_datalogger: Sleeping 10.0s

Additional info

I was curious if other docker users are seeing the same symptoms as I don't recall seeing it before. I don't know if it's a docker or DSMR issue, but I hope someone has a clue where I can find out what's the issue.

xirixiz commented 1 year ago

Hi, very interesting indeed. Not sure how the process handles incoming data, but maybe @dennissiemensma has some more details about this.

dennissiemensma commented 1 year ago

You may want to look into the open files and system calls used by the process.

If the following packages are available in the container, try:

lsof -p 14087

strace -v -f -s 1024 -p 14087

Replacing the 14087 with whatever process ID you're looking into.

dofl commented 1 year ago

The Synology NAS is missing some of the packages, but /proc/fd is always available:

> ls -l /proc/14087/fd
> total 0
> lr-x------ 1 root root 64 Oct 12 16:22 0 -> /dev/null
> l-wx------ 1 root root 64 Oct 12 16:22 1 -> 'pipe:[13751701]'
> l-wx------ 1 root root 64 Oct 12 16:22 2 -> 'pipe:[13751702]'
> l-wx------ 1 root root 64 Oct 12 16:22 3 -> /run/s6/services/dsmr_remote_datalogger/supervise/lock
> lr-x------ 1 root root 64 Oct 12 16:22 4 -> /run/s6/services/dsmr_remote_datalogger/supervise/control
> l-wx------ 1 root root 64 Oct 12 16:22 5 -> /run/s6/services/dsmr_remote_datalogger/supervise/control
> lrwx------ 1 root root 64 Oct 12 16:22 6 -> 'anon_inode:[signalfd]'

It's binding to a pipe that gives no more information, I wasn't able to find any files bound to this pipe. Can anyone tell me if they are seeing the same behavior as well?

dennissiemensma commented 1 year ago

I'm not using the containerized version. Did you try installing lsof and strace within the container? They might be available for install.

If so, I'm sure it will give you more insight regarding whatever the process is doing.

xirixiz commented 1 year ago

Yeah, you can add packages to the running container, but I believe dofl is talking about the host system (Synology). Example to install a package in a running container:

docker exec -ti dsmr bash
apk add strace

I'm most possibly facing the same behavior on my nas. However, this is not really an area I`m familiar with. Is this also normal behavior in a non-Docker setup perhaps?

On the Synology host you can maybe run:

sudo synogear install

So strace is maybe available.

dennissiemensma commented 1 year ago

I'd start by checking the process in the container. Maybe try iotop as well.

When I run iotop, filtering active write processes, it's almost constantly idle. Only postgres seems to write to the disk every once in a while, which makes sense.

xirixiz commented 1 year ago

I just noticed it's about the remote datalogger, not the datalogger. I`ll investigate this further Tonight. I guess the init process causes the issue. It shouldn't start the remote datalogger if you're not using it (not defined as var).

xirixiz commented 1 year ago

I believe I've fixed it. Please validate with the new image.

xirixiz commented 1 year ago

The Docker images uses S6-Overlay to start processes based on defined variables. By default the variable of dsmr remote datalogger isn't set, so it shouldn't start. However, as it's trying to start as a service, S6 doens't see the process running and runs the script over and over again. So, what I found is to add a sleep infinity step if the variable hasn't been defined. S6 has the idea the service has been started properly, but underwater it doesn't do anything anymore, so also not restarting over and over again.

if [[ <SOME VAR DEFINED ]]; then
    <START SOME PROCESS>
else
    sleep infinity
fi

So what happens now, is that the service goes to "sleep". I've found this solution in onme of the linuxserver Docker images.

dofl commented 1 year ago

I just did a pull on the new image and the constant write is gone. Seems like it was the S6 overlay indeed. Nice find!

xirixiz commented 1 year ago

It was good collaboration. Nice finding by you as well!

shostakovits commented 1 month ago

I believe the constant write problem is back. I run the docker on my Unraid machine. And when I do a 'iotop-c', the output is as follows:

S6-supervise

As far as my knowledge goes, I believe the writing comes from the 3 dsmr-services.

The writing of the 3 times 11kb/s is constant. Is it possible to fix it again??

Thank you in advance.