kamu-data / kamu-node

Back-end implementation of the Open Data Fabric protocol
https://kamu.dev
Other
13 stars 2 forks source link

Demo env: Derivative dataset updates are not working #108

Closed sergiimk closed 4 months ago

sergiimk commented 4 months ago

Steps to reproduce:

  1. Create a simple root and derivative datasets (manifests included below)
  2. Push some data into root
  3. Click "Update data" on derivative dataset
  4. Observe update failing after ~30 seconds

From inspecting logs of the pod and the engine containers started inside the pod:

The suspicion is a podman networking mode as the following line is present in stderr when engine container is started:

Port mappings have been discarded as one of the Host, Container, Pod, and None network modes are in use

Test manifests:

kind: DatasetSnapshot
version: 1
content:
  name: gps
  kind: Root
  metadata:
    - kind: AddPushSource
      sourceName: default
      read:
        kind: NdJson
        schema:
          - long DOUBLE
          - lat DOUBLE
      merge:
        kind: Append

---
kind: DatasetSnapshot
version: 1
content:
  name: gps-avg
  kind: Derivative
  metadata:
    - kind: SetTransform
      inputs:
        - datasetRef: gps
      transform:
        kind: Sql
        engine: datafusion
        query: |
          select
            event_time,
            long,
            lat
          from gps

data to push:

{"t": "2020-01-01T00:01:00Z", "long": -123.12, "lat": 49.28}
sergiimk commented 4 months ago

Command that starts the container:

{
  "v": 0,
  "name": "kamu-api-server",
  "msg": "[INIT_ENGINE - EVENT] Spawning container",
  "level": 30,
  "hostname": "kamu-api-server-5ccc84ff58-5bphs",
  "pid": 7,
  "time": "2024-07-23T20:39:26.207991659Z",
  "target": "container_runtime::container",
  "line": 294,
  "file": "/cargo/git/checkouts/kamu-cli-2db697a901e9a060/5a85211/src/utils/container-runtime/src/container.rs",
  "image": "ghcr.io/kamu-data/engine-datafusion:0.7.2",
  "container_name": "kamu-engine-ftRV5FcCkc",
  "cmd": "Command { std: \"podman\" \"run\" \"--rm\" \"--name=kamu-engine-ftRV5FcCkc\" \"-p\" \"2884\" \"-v\" \"/tmp/.tmpWHPLM0/run/transform-ftRV5FcCkc/out:/opt/engine/out\" \"-v\" \"/tmp/.tmpWHPLM0/run/transform-ftRV5FcCkc/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:/opt/engine/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:ro\" \"-v\" \"/tmp/.tmpWHPLM0/run/transform-ftRV5FcCkc/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:/opt/engine/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:ro\" \"-v\" \"/tmp/.tmpWHPLM0/run/transform-ftRV5FcCkc/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:/opt/engine/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:ro\" \"ghcr.io/kamu-data/engine-datafusion:0.7.2\", kill_on_drop: false }",
  "operation_id": "ftRV5FcCkc",
  "dataset_ref": "sergiimk/gps-avg"
}

Output of the podman inspect command on the engine container withing api-server pod:

[
    {
        "Id": "2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0",
        "Created": "2024-07-23T20:44:43.31762932Z",
        "Path": "/sbin/tini",
        "Args": [
            "--",
            "/opt/engine/bin/kamu-engine-datafusion"
        ],
        "State": {
            "OciVersion": "1.0.2-dev",
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 31283,
            "ConmonPid": 31280,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2024-07-23T20:44:43.470026774Z",
            "FinishedAt": "0001-01-01T00:00:00Z",
            "Healthcheck": {
                "Status": "",
                "FailingStreak": 0,
                "Log": null
            }
        },
        "Image": "d498436833ded96594d0440a33938d722d1186a437f8c397605d3f2944bfc16f",
        "ImageName": "ghcr.io/kamu-data/engine-datafusion:0.7.2",
        "Rootfs": "",
        "Pod": "",
        "ResolvConfPath": "/run/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/resolv.conf",
        "HostnamePath": "/run/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/hostname",
        "HostsPath": "/run/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/hosts",
        "StaticDir": "/var/lib/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata",
        "OCIConfigPath": "/var/lib/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/config.json",
        "OCIRuntime": "crun",
        "ConmonPidFile": "/run/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/conmon.pid",
        "PidFile": "/run/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/pidfile",
        "Name": "kamu-engine-h8QgWxhrjQ",
        "RestartCount": 0,
        "Driver": "overlay",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "EffectiveCaps": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "BoundingCaps": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "ExecIDs": [],
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "LowerDir": "/var/lib/containers/storage/overlay/13b1c7ce120fe8c912bfd5a2200f1006d71fd2278a0cebbd7b5f9556ed8150dc/diff:/var/lib/containers/storage/overlay/64686ae027c0cef534404c7c7ea204b5028e0f96a7947715a3bffe76208beb5f/diff:/var/lib/containers/storage/overlay/d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820/diff",
                "MergedDir": "/var/lib/containers/storage/overlay/584510517e96641c9d5e6bff1d64dd42e52925a0fb11a313c7f0705b5931bb01/merged",
                "UpperDir": "/var/lib/containers/storage/overlay/584510517e96641c9d5e6bff1d64dd42e52925a0fb11a313c7f0705b5931bb01/diff",
                "WorkDir": "/var/lib/containers/storage/overlay/584510517e96641c9d5e6bff1d64dd42e52925a0fb11a313c7f0705b5931bb01/work"
            }
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469",
                "Destination": "/opt/engine/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "rbind"
                ],
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e",
                "Destination": "/opt/engine/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "rbind"
                ],
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5",
                "Destination": "/opt/engine/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "rbind"
                ],
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/out",
                "Destination": "/opt/engine/out",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "rbind"
                ],
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Dependencies": [],
        "NetworkSettings": {
            "EndpointID": "",
            "Gateway": "",
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "MacAddress": "",
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": ""
        },
        "ExitCommand": [
            "/usr/bin/podman",
            "--root",
            "/var/lib/containers/storage",
            "--runroot",
            "/run/containers/storage",
            "--log-level",
            "warning",
            "--cgroup-manager",
            "cgroupfs",
            "--tmpdir",
            "/run/libpod",
            "--runtime",
            "crun",
            "--storage-driver",
            "overlay",
            "--storage-opt",
            "overlay.imagestore=/var/lib/containers/shared",
            "--storage-opt",
            "overlay.ignore_chown_errors=false",
            "--storage-opt",
            "overlay.mount_program=/usr/bin/fuse-overlayfs",
            "--storage-opt",
            "overlay.mountopt=nodev,fsync=0",
            "--events-backend",
            "file",
            "container",
            "cleanup",
            "--rm",
            "2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0"
        ],
        "Namespace": "",
        "IsInfra": false,
        "Config": {
            "Hostname": "2abea78b112f",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "TERM=xterm",
                "container=podman",
                "RUST_BACKTRACE=1",
                "RUST_LOG=debug",
                "HOME=/root",
                "HOSTNAME=kamu-api-server-5ccc84ff58-5bphs"
            ],
            "Cmd": null,
            "Image": "ghcr.io/kamu-data/engine-datafusion:0.7.2",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": "/sbin/tini -- /opt/engine/bin/kamu-engine-datafusion",
            "OnBuild": null,
            "Labels": {
                "org.opencontainers.image.created": "2024-02-09T04:14:31.596Z",
                "org.opencontainers.image.description": "ODF engine based on Apache Arrow DataFusion",
                "org.opencontainers.image.licenses": "Apache-2.0",
                "org.opencontainers.image.revision": "83eec7073a453a0da8b94e4f3616a12dad324829",
                "org.opencontainers.image.source": "https://github.com/kamu-data/kamu-engine-datafusion",
                "org.opencontainers.image.title": "kamu-engine-datafusion",
                "org.opencontainers.image.url": "https://github.com/kamu-data/kamu-engine-datafusion",
                "org.opencontainers.image.vendor": "Kamu Data Inc.",
                "org.opencontainers.image.version": "0"
            },
            "Annotations": {
                "io.container.manager": "libpod",
                "io.kubernetes.cri-o.Created": "2024-07-23T20:44:43.31762932Z",
                "io.kubernetes.cri-o.TTY": "false",
                "io.podman.annotations.autoremove": "TRUE",
                "io.podman.annotations.init": "FALSE",
                "io.podman.annotations.privileged": "FALSE",
                "io.podman.annotations.publish-all": "FALSE",
                "org.opencontainers.image.stopSignal": "15"
            },
            "StopSignal": 15,
            "CreateCommand": [
                "podman",
                "run",
                "--rm",
                "--name=kamu-engine-h8QgWxhrjQ",
                "-p",
                "2884",
                "-v",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/out:/opt/engine/out",
                "-v",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:/opt/engine/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:ro",
                "-v",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:/opt/engine/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:ro",
                "-v",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:/opt/engine/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:ro",
                "ghcr.io/kamu-data/engine-datafusion:0.7.2"
            ],
            "Umask": "0022",
            "Timeout": 0,
            "StopTimeout": 10
        },
        "HostConfig": {
            "Binds": [
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:/opt/engine/in/f16206b356ec30d359f534c9f92f2d6168b323b7556ffad2cfb67842b633c81ab8469:ro,rprivate,rbind",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:/opt/engine/in/f1620a1e4aea5303008aab7c26dc15d4df705c749fe0d87fd8e0ff60c81585e48e90e:ro,rprivate,rbind",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:/opt/engine/in/f162016258c21775f6ef753e41ec0beae0fe38c52613971029813dd395247b1aae1e5:ro,rprivate,rbind",
                "/tmp/.tmpWHPLM0/run/transform-h8QgWxhrjQ/out:/opt/engine/out:rw,rprivate,rbind"
            ],
            "CgroupManager": "cgroupfs",
            "CgroupMode": "host",
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "k8s-file",
                "Config": null,
                "Path": "/var/lib/containers/storage/overlay-containers/2abea78b112f49f970d8df9f9cfc198d71c15479bf94d246276a65af05e911f0/userdata/ctr.log",
                "Tag": "",
                "Size": "0B"
            },
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": true,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": [],
            "CapDrop": [
                "CAP_AUDIT_WRITE",
                "CAP_MKNOD",
                "CAP_NET_RAW"
            ],
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": [],
            "GroupAdd": [],
            "IpcMode": "host",
            "Cgroup": "",
            "Cgroups": "disabled",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "private",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [],
            "Tmpfs": {},
            "UTSMode": "host",
            "UsernsMode": "",
            "ShmSize": 65536000,
            "Runtime": "oci",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": 0,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": [
                {
                    "Name": "RLIMIT_NOFILE",
                    "Soft": 1048576,
                    "Hard": 1048576
                },
                {
                    "Name": "RLIMIT_NPROC",
                    "Soft": 4194304,
                    "Hard": 4194304
                }
            ],
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "CgroupConf": null
        }
    }
]

We can see "NetworkMode": "host" so the port should be opening on pod's localhost interface.

Result of kamu config list:

engine:
  runtime: podman
  networkNs: host

This shows that we expect podman to operate in host mode when using kamu-cli.

We should confirm that kamu-api-server is also being configured to use host network namespace.

sergiimk commented 4 months ago

90% sure that the cause is this configuration:

 b.add_value(container_runtime::ContainerRuntimeConfig {
        runtime: container_runtime::ContainerRuntimeType::Podman,
        network_ns: container_runtime::NetworkNamespaceType::Private,
    });

it causes mismatch between what server expects (private namespace) and the actual mode (host namespace).

Big question is how this worked before...

sergiimk commented 4 months ago

The above change was deployed and after configuring netowrkNs: host problem is resolved