thingsdb / ThingsDB

Node - The ThingsDB Core
https://thingsdb.io
GNU General Public License v3.0
41 stars 3 forks source link

Crashes when running in Kubernetes #376

Closed rickmoonex closed 4 months ago

rickmoonex commented 4 months ago

Describe the bug I'm trying to move my docker deployment of ThingsDB to Kubernetes. I have modified the StatefulSet documented under the GKE documentation. But the container becomes trapped in a crash loop with no useable logs. Even after stripping it down to a minimal Pod deployment it experiences the same issue.

To Reproduce Steps to reproduce the behavior:

  1. Deploy the following StatefulSet to K8s:
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
    name: thingsdb
    labels:
    app: thingsdb
    spec:
    selector:
    matchLabels:
      app: thingsdb
    serviceName: thingsdb
    replicas: 1
    updateStrategy:
    type: RollingUpdate
    podManagementPolicy: Parallel
    template:
    metadata:
      labels:
        app: thingsdb
    spec:
      terminationGracePeriodSeconds: 90
      dnsConfig:
        searches:
        - thingsdb.default.svc.cluster.local
      tolerations:  # wait 10 miniutes as synchronizing might take some time
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 600
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 600
      containers:
      - name: thingsdb
        image: ghcr.io/thingsdb/node:latest # Latest version at the time of writing
        imagePullPolicy: Always
        args: ["--deploy"]  # Tells ThingsDB it will be deployed in Kubernetes
        env:
        - name: THINGSDB_HTTP_STATUS_PORT
          value: "8080"
        - name: THINGSDB_HTTP_API_PORT
          value: "9210"
        - name: THINGSDB_STORAGE_PATH
          value: /mnt/thingsdb/
        - name: THINGSDB_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - name: status
          containerPort: 8080
        - name: client
          containerPort: 9200
        - name: http
          containerPort: 9210
        - name: node
          containerPort: 9220
        volumeMounts:
        - name: data
          mountPath: /mnt/thingsdb
        resources:
          requests:
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /healthy
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 20
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 3
    volumeClaimTemplates:
    - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
  2. Deploy the 'minimal' Pod deployment tot K8s:
    ---
    apiVersion: v1
    kind: Pod
    metadata:
    name: thingsdb
    labels:
    app: thingsdb
    spec:
    containers:
    - name: thingsdb
      image: ghcr.io/thingsdb/node:latest
      args: 
        - "--init"
        - "--log-level debug"
      ports:
        - containerPort: 9200

Expected behavior The ThingsDB container should start and run as expected. Or at least show some debugging information.

Screenshots

kubectl get pods
NAME                READY   STATUS             RESTARTS        AGE
thingsdb            0/1     CrashLoopBackOff   6 (3m54s ago)   9m32s
thingsdb-sample-0   0/1     CrashLoopBackOff   11 (10s ago)    31m
kubectl logs thingsdb
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.0
                  |___|
EOF

Machine/OS:

joente commented 4 months ago

I'm not sure what is wrong with the StatefulSet, however the 'minimal' pod has a small mistake with the --log-level argument. (spaces are not allowed and therefore debug must be given as a separate argument)

---
apiVersion: v1
kind: Pod
metadata:
  name: thingsdb
  labels:
    app: thingsdb
spec:
  containers:
    - name: thingsdb
      image: ghcr.io/thingsdb/node:latest
      args: 
        - "--init"
        - "--log-level"
        - "debug"
      ports:
        - containerPort: 9200

I would however expect a different log: something like "unrecognized argument....".

If it still doesn't work after fixing the 'minimal' pod, can you verify the output of the kubectl describe pod <pod_name> command?

rickmoonex commented 4 months ago

Thanks for the quick response. I've updated the args for the deployment but I'm still running into the same issue.

Here's the output from the kubectl describe command:

kubectl describe pod thingsdb
Name:             thingsdb
Namespace:        default
Priority:         0
Service Account:  default
Node:             node01/192.168.88.5
Start Time:       Mon, 22 Apr 2024 16:36:37 +0200
Labels:           app=thingsdb
Annotations:      <none>
Status:           Running
IP:               10.244.0.83
IPs:
  IP:  10.244.0.83
Containers:
  thingsdb:
    Container ID:  containerd://16ce8c55d159af73cc68aa9b2aa1b2cc798bfa58a3eab219dfe3d89a1825f8b9
    Image:         ghcr.io/thingsdb/node:latest
    Image ID:      ghcr.io/thingsdb/node@sha256:01aa77d067ffce69887f83d9b1ac129bb4bd5abadcf06d1277a412da36084e0e
    Port:          9200/TCP
    Host Port:     0/TCP
    Args:
      --init
      --log-level
      debug
    State:          Terminated
      Reason:       Error
      Exit Code:    132
      Started:      Mon, 22 Apr 2024 16:37:18 +0200
      Finished:     Mon, 22 Apr 2024 16:37:19 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    132
      Started:      Mon, 22 Apr 2024 16:36:54 +0200
      Finished:     Mon, 22 Apr 2024 16:36:55 +0200
    Ready:          False
    Restart Count:  3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-plfgr (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  kube-api-access-plfgr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  52s                default-scheduler  Successfully assigned default/thingsdb to node01
  Normal   Pulled     51s                kubelet            Successfully pulled image "ghcr.io/thingsdb/node:latest" in 389ms (389ms including waiting)
  Normal   Pulled     49s                kubelet            Successfully pulled image "ghcr.io/thingsdb/node:latest" in 794ms (794ms including waiting)
  Normal   Pulled     35s                kubelet            Successfully pulled image "ghcr.io/thingsdb/node:latest" in 485ms (485ms including waiting)
  Normal   Pulling    11s (x4 over 52s)  kubelet            Pulling image "ghcr.io/thingsdb/node:latest"
  Normal   Created    11s (x4 over 51s)  kubelet            Created container thingsdb
  Normal   Started    11s (x4 over 51s)  kubelet            Started container thingsdb
  Normal   Pulled     11s                kubelet            Successfully pulled image "ghcr.io/thingsdb/node:latest" in 506ms (506ms including waiting)
  Warning  BackOff    10s (x5 over 48s)  kubelet            Back-off restarting failed container thingsdb in pod thingsdb_default(2cc92bb4-00c0-4459-9e68-99c861e0c3cb)
rickmoonex commented 4 months ago

I tried running the container locally with the following command:

docker run \
    --name thingsdb \
    -d \
    -p 9200:9200 \
    ghcr.io/thingsdb/node --init

It gives me an error that I'm running on a wrong platform. This is understandable as I'm running on Apple Silicon. But the container does start, experiencing the same behaviour as the container on K8s (which is running on Linux/AMD64). ThingsDB 'logo' appears, then crashes.

I further tried troubleshooting the issue on a Windows AMD64 machine. I installed minikube and the Pod deloyment ran fine. So there is something wrong with my environment that makes ThingsDB crash. But I can't figure out what it is...

rickmoonex commented 4 months ago

Another quick update.

I compiled ThingsDB from source on my Apple Silicon machine. Ran like a dream. So maybe it's an idea to make an ARM64 container available. I can create a fork and start working on that.

joente commented 4 months ago

@rickmoonex , I'd be happy to send you an email with a pre-release copy of the ThingsDB book. Just confirm if you'd like it sent to the email address associated with your GitHub account.

rickmoonex commented 4 months ago

@joente That would be great! The email associated with my GitHub account is fine. Thanks!

joente commented 4 months ago

@joente That would be great! The email associated with my GitHub account is fine. Thanks!

I've just sent you an email with the pre-release copy of the ThingsDB book!

rickmoonex commented 4 months ago

Great, thanks you!

I've created PR #377 for the ARM container I'll investigate the Kubernetes issue further.

rickmoonex commented 4 months ago

I have managed to do some more debugging.

I have deployed the container as follows:

---
apiVersion: v1
kind: Pod
metadata:
  name: thingsdb
  labels:
    app: thingsdb
spec:
  containers:
  - name: thingsdb
    image: ghcr.io/thingsdb/node:latest
    command: ["sh", "-c"]
    args: ["while true; do echo 'yo' && sleep 5; done;"]
    ports:
    - containerPort: 9200

I then manually ran ThingsDB and it gave the following error:

/usr/local/bin # thingsdb --version
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.0
                  |___|

Illegal instruction (core dumped)

Further debugged this with GDB and came across the following:

/usr/local/bin # gdb thingsdb
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from thingsdb...
(gdb) run
Starting program: /usr/local/bin/thingsdb
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.0
                  |___|

Program received signal SIGILL, Illegal instruction.
0x0000555555597a00 in ti_create ()
(gdb) bt full
#0  0x0000555555597a00 in ti_create ()
No symbol table info available.
#1  0x0000555555593c68 in main ()
No symbol table info available.
(gdb)

Now I'm no C wizard, so I don't know why these instructions are not available on my CPU. The node that this pod is running on has a Intel Celeron N5105.

joente commented 4 months ago

v1.6.1-alpha1 has been build (with an ARM64 image included):

docker pull ghcr.io/thingsdb/node:arm64-v1.6.1-alpha1

@rickmoonex , can you try this image?

rickmoonex commented 4 months ago

@joente The image works great on my Mac.

But I still have the issue on my K8s node. (Just for clarity, that machine is not an ARM machine).

I have done some more debugging and added it above.

joente commented 4 months ago

@rickmoonex , A debug build might help to troubleshoot the problem.

To create a debug build of ThingsDB from the source code, run the following command:

./debug-build.sh

This build prioritizes debugging information over optimization. It uses the -O0 flag to disable optimizations and the -g3 flag to generate symbols table.

rickmoonex commented 4 months ago

@joente, I did some more debugging and came across some strange behaviour.

I did a kubectl cp to copy the source files into the ghcr.io/thingsdb/node:latest container. There I installed the build dependencies apk add gcc make cmake libuv-dev musl-dev pcre2-dev yajl-dev curl-dev util-linux-dev linux-headers and ran the ./debug-build.sh script. The binary that resulted from that build actually ran without issues...

I then changed cmake -DCMAKE_BUILD_TYPE=Release . to cmake -DCMAKE_BUILD_TYPE=Debug . in docker/Dockerfile. Then ran the build locally on my Mac with docker build --platform linux/amd64 --file docker/Dockerfile -t rickmoonen/thingsdb-test:dev --push .. That container also ran without any issues.....

I then modified the Dockerfile and deploy workflow in the main branch of my fork to create a debug build. And manually ran the deploy workflow. The docker image that came from this workflow actually did crash with the Illegal instruction error. I debugged it with gdb and it gave me the following:

/usr/local/bin # gdb ./thingsdb
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./thingsdb...
(gdb) run
Starting program: /usr/local/bin/thingsdb
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.1-alpha0+debug
                  |___|

Program received signal SIGILL, Illegal instruction.
ti_counters_reset () at /tmp/thingsdb/src/ti/counters.c:47
warning: 47 /tmp/thingsdb/src/ti/counters.c: No such file or directory
(gdb) bt full
#0  ti_counters_reset () at /tmp/thingsdb/src/ti/counters.c:47
No locals.
#1  0x0000555555622f3a in ti_counters_create () at /tmp/thingsdb/src/ti/counters.c:18
No locals.
#2  0x00005555555aa5b9 in ti_create () at /tmp/thingsdb/src/ti.c:101
No locals.
#3  0x00005555555a2c99 in main (argc=1, argv=0x7fffffffe648) at /tmp/thingsdb/main.c:95
        seed = -855361564
        fd = 3
        rc = 0
(gdb)

So these findings are hinting that there is something wrong with the way the container is built. And not so much with ThingsDB itself.

joente commented 4 months ago

Looking at where it fails, it might be something related to the atomic counters. I've created a branch natomic where this is disabled (Using NATOMIC). @rickmoonex , are you able to test this branch?

Note: this is not ideal, as the affected counter updates are not thread safe, but at the same time the counters are not critical so even if this fails which already is unlikely it will only result in wrong counter info.

rickmoonex commented 4 months ago

Once again running into the same issue. If I build the container locally it runs great, if I let GitHub Actions build it it breaks.

Here is the backtrace, same error as last time:

/usr/local/bin # gdb thingsdb
GNU gdb (GDB) 14.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from thingsdb...
(gdb) run
Starting program: /usr/local/bin/thingsdb
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.1-alpha1+debug
                  |___|

Program received signal SIGILL, Illegal instruction.
ti_counters_reset () at /tmp/thingsdb/src/ti/counters.c:47
warning: 47 /tmp/thingsdb/src/ti/counters.c: No such file or directory
(gdb) bt full
#0  ti_counters_reset () at /tmp/thingsdb/src/ti/counters.c:47
No locals.
#1  0x0000555555622f43 in ti_counters_create () at /tmp/thingsdb/src/ti/counters.c:18
No locals.
#2  0x00005555555aa5b9 in ti_create () at /tmp/thingsdb/src/ti.c:101
No locals.
#3  0x00005555555a2c99 in main (argc=1, argv=0x7fffffffe648) at /tmp/thingsdb/main.c:95
        seed = -1041532550
        fd = 3
        rc = 0
(gdb)
joente commented 4 months ago

@rickmoonex , can you provide the output of kubectl version ?

rickmoonex commented 4 months ago

@joente, here you go:

kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3
joente commented 4 months ago

Sorry, can you provide the full output? kubectl version --output=json ?

rickmoonex commented 4 months ago
{
  "clientVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.2",
    "gitCommit": "4b8e819355d791d96b7e9d9efe4cbafae2311c88",
    "gitTreeState": "clean",
    "buildDate": "2024-02-14T10:32:39Z",
    "goVersion": "go1.21.7",
    "compiler": "gc",
    "platform": "darwin/arm64"
  },
  "kustomizeVersion": "v5.0.4-0.20230601165947-6ce0bf390ce3",
  "serverVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.3",
    "gitCommit": "6813625b7cd706db5bc7388921be03071e1a492d",
    "gitTreeState": "clean",
    "buildDate": "2024-03-14T23:58:36Z",
    "goVersion": "go1.21.8",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
riklempens commented 4 months ago

If I build the container locally it runs great, if I let GitHub Actions build it it breaks.

On which platform did you build the container successfully, what this a native AMD64 host?

Or is your linux/amd64 platform emulated on your M1 device? The kubectl version --output=json seems to suggest so.

rickmoonex commented 4 months ago

@riklempens The actual architecture I built the container on is darwin/arm64 (so natively on my mac). I do however explicitly tell docker buildx to build for a linux/amd64 platform. See the docker build command I used:

docker build --platform linux/amd64 --file docker/Dockerfile -t rickmoonen/thingsdb-test:dev --push .

Note: On my machine I have docker build set-up as a alias for docker buildx build. See keep that in mind when reading the command.

I just tried building the container on a native AMD64 machine, and it also built and ran totally fine. I'm really scratching my head on what the issue could be.

rickmoonex commented 4 months ago

The kubectl version --output=json seems to suggest so.

The serverVersion section here is describing my homelab K8s cluster. Which is running on some low power Intel Celeron N5105 machines running Ubuntu 23.04.

riklempens commented 4 months ago

So to summarize:

rickmoonex commented 4 months ago

So to summarize:

  • Building and running on the AMD64 host locally works. βœ”
  • Building on an ARM system specifying the linux/amd64 platform works and this image runs on the AMD node βœ”
  • The image build using GitHub actions fails to run on the AMD64 host 𐄂

Correct!

joente commented 4 months ago

I'm not sure how to continue with this issue. If I could reproduce the problem, solving would be easier. It seems to be related to older hardware, and compiling ThingsDB instead of relying on the pre-build images seems to work. @rickmoonex , do you have any suggestions how to continue?

rickmoonex commented 4 months ago

I have once again done some more troubleshooting. Here is the summary:

I first tried to edit the GitHub actions pipeline to run a normal docker build command instead of buildx. Still ran into the same issue. Tried editing the docker version on the GitHub workflow to the same version that I'm running locally. Still no luck.

I then wanted to find out of the issues lied with the pre-build GitHub actions supplied by Docker. So I ran the pipeline locally using act. The pipeline ran fine and the resulting image worked on my K8s cluster.

This leads me to believe that there is an underlying problem with the GitHub runners that are used for the pipelines. Maybe switching ubuntu version will help. I will further try troubleshooting this tomorrow. I'll keep you updated.

rickmoonex commented 4 months ago

I'm really at a dead-end here. Did some research on the hardware that I'm running on at found this in the datasheet:

All models support: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, Enhanced Intel SpeedStep Technology (EIST), Intel 64, XD bit (an NX bit implementation), Intel VT-x, Intel VT-d, AES-NI, Intel SHA Extensions, Intel SGX, SMAP/SMEP

So I don't think the problem is the hardware and any unavailable instructions.

I did a docker build of the image. One locally and one on GH Actions, same command, same Dockerfile, same Docker version. I then ran these containers and extracted the binaries. Looking at those binaries I can already see that their file sizes are different, the one that's built locally is slightly larger (10kB).

I tried switching to different versions of Alpine for the container, this yielded no results. I also did some extensive Googling of this is a know problem, but couldn't find anything......

rickmoonex commented 4 months ago

Alright I have some good news, found and solved one problem. But then another showed up haha.

I turns out that the Intel processor I'm using does not support AVX instructions. Neither do my Mac and the other machine I was building on, that why those containers ran fine. But the one built in GitHub did not.

So I disable AVX on the compiler with the following:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-avx")

That solved that problem.

Now ThingsDB still gives an illegal instruction error, but way later in the program. See logging below:

(gdb) run
Starting program: /usr/local/bin/thingsdb
   _____ _   _             ____  _____
  |_   _| |_|_|___ ___ ___|    \| __  |
    | | |   | |   | . |_ -|  |  | __ -|
    |_| |_|_|_|_|_|_  |___|____/|_____|   version: 1.6.1-alpha0+debug
                  |___|

[I 2024-04-26 10:14:46] running on: linux/amd64
[W 2024-04-26 10:14:46] path is successfully locked but a lock file existed which indicates that the process was not closed correctly last time (/data/)
[D 2024-04-26 10:14:46] found node id `0` in file: `/data/.node`
[W 2024-04-26 10:14:46] store path not found: `/data/store/`
[I 2024-04-26 10:14:46] start listening for HTTP status requests on TCP port 8080
[D 2024-04-26 10:14:46] known committed on all nodes: `change:0`
[D 2024-04-26 10:14:46] known stored on all nodes: `change:0`
[D 2024-04-26 10:14:46] loading archive files from `/data/archive/`
[I 2024-04-26 10:14:46] changing status from SYNCHRONIZING to READY
[I 2024-04-26 10:14:46] start listening for node connections on TCP port 9220
[I 2024-04-26 10:14:46] start listening for client connections on TCP port 9200
[I 2024-04-26 10:14:46] start listening for HTTP API requests on TCP port 9210
[I 2024-04-26 10:14:46] start listening for WebSocket connections on TCP port 9270

Program received signal SIGILL, Illegal instruction.
0x00005555558e8b5c in lwsl_timestamp (level=16, p=0x555555ab1040 <buf> "", len=256) at /tmp/thingsdb/libwebsockets/lib/core/logs.c:228
warning: 228    /tmp/thingsdb/libwebsockets/lib/core/logs.c: No such file or directory
(gdb) bt full
#0  0x00005555558e8b5c in lwsl_timestamp (level=16, p=0x555555ab1040 <buf> "", len=256) at /tmp/thingsdb/libwebsockets/lib/core/logs.c:228
        o_now = 1714126486
        now = 17141264861705
        tv = {tv_sec = 1714126486, tv_usec = 170531}
        ptm = 0x7fffffffde40
        tm = {tm_sec = 46, tm_min = 14, tm_hour = 10, tm_mday = 26, tm_mon = 3, tm_year = 124, tm_wday = 5, tm_yday = 116, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x7ffff76db068 "UTC"}
        n = 0
rickmoonex commented 4 months ago

Second error was due to BMI2 instructions not being supported. Disabled with:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-bmi2")

Now it works like a charm!

@joente, what would be a logical next step? Are these instruction sets crucial to ThingsDB, if no I suppose they can be disabled. If yes, then maybe create a different container for 'legacy' systems.

joente commented 4 months ago

They are not crucial, it is just to tell the compiler what instructions can be used. It might have some performance impact, but probably not that much.

@rickmoonex , I'll build an alpha version with the flags set as suggested.

rickmoonex commented 4 months ago

Works like a charm!

Thanks for all the help.

rickmoonex commented 4 months ago

@joente, just ran into another instruction error when joining nodes. BMI1 also needs to be disabled:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-bmi")
joente commented 4 months ago

Almost, I think I need to move the lines to keep the ARM build working

66.34 cc: error: unrecognized command-line option '-mno-avx'
66.34 cc: error: unrecognized command-line option '-mno-bmi2'

I've added an environment var LEGACY which is set to 1 in the Dockerfile and full.Dockerfile. There is no point in the gcloud image as this image is intended to run on the Google cloud platform. The ARM build is excluded by the aarch64 match.

  ...
    if (NOT ${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse4.2")
        if($ENV{LEGACY})
            # Turn off AVX and BMI2 instructions for legacy systems
            set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-avx")
            set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-avx2")
            set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-bmi")
            set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mno-bmi2")
        endif()
    endif()
  ...
joente commented 4 months ago

@rickmoonex , the images v1.6.1-alpha3 are ready...

rickmoonex commented 4 months ago

Screenshot 2024-04-26 at 15 53 23

All up and running!

I will do some more testing and if I don't encounter any more issues I will close this issue. Thanks for all the help!

joente commented 4 months ago

Hi @rickmoonex, did you have time to do some testing? If everything works as expected then the issue can be closed and I'll create a release version.

rickmoonex commented 4 months ago

Hi @joente, had a busy week but managed to do some testing yesterday evening and this morning. I haven't experience any issues. So we can close the issue.

joente commented 4 months ago

V1.6.1 released: https://github.com/thingsdb/ThingsDB/releases/tag/v1.6.1