Open jeff303 opened 6 months ago
OK, I had a little bit of help on this. Turns out, I just need to define the Fluentd CRD with the arm64
arch variant and it starts up fine.
image: kubesphere/fluentd:v1.14.6-arm64
However, the process still seems to fail
/fluentd/bin/fluentd-watcher
level=error msg="start Fluentd error" error="fork/exec /usr/bin/fluentd: no such file or directory"
level=info msg=backoff delay=1s
level=info msg="backoff timer done" actual=1.001294876s expected=1s
level=error msg="start Fluentd error" error="fork/exec /usr/bin/fluentd: no such file or directory"
I tried running the container as root, and symlinking that path, and that seemed to fix it, but not sure how to persist this to the CRD.
docker run -u root --entrypoint bash -it kubesphere/fluentd:v1.14.6-arm64
# within the container
ln -s /usr/local/bundle/bin/fluentd /usr/bin/fluentd
/fluentd/bin/fluentd-watcher
level=info msg="Fluentd started"
2024-03-22 14:03:58 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-03-22 14:03:58 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
...
Edit: after reading the source here, I found a cleaner way to do this that doesn't require running as root.
docker run kubesphere/fluentd:v1.14.6-arm64 -b /usr/local/bundle/bin/fluentd
Unfortunately, I can't find any way to get this to persist in the statefulset
, since it gets overwritten immediately (presumably by the operator). And the fluentd
CRD doesn't seem to be able to take these options anywhere (unless I missed it).
Edit 2: OK, that can be persisted via args
in the CRD!
spec:
args:
- -b
- /usr/local/bundle/bin/fluentd
With these changes, everything seems to be running on my M3.
OK, it seems that the forward
part isn't working at all.
[2024/03/25 21:53:43] [error] [output:forward:forward.0] no upstream connections available
[2024/03/25 21:53:43] [error] [output:forward:forward.0] no upstream connections available
[2024/03/25 21:53:43] [ warn] [engine] failed to flush chunk '93-1711403618.673907638.flb', retry in 9 seconds: task_id=3, input=tail.0 > output=forward.0 (out_id=0)
[2024/03/25 21:53:43] [error] [src/flb_http_client.c:1172 errno=32] Broken pipe
[2024/03/25 21:53:43] [ warn] [output:es:es.1] http_do=-1 URI=/_bulk
[2024/03/25 21:53:43] [ warn] [engine] failed to flush chunk '93-1711403618.678663555.flb', retry in 8 seconds: task_id=5, input=tail.0 > output=forward.0 (out_id=0)
[2024/03/25 21:53:43] [ warn] [engine] failed to flush chunk '93-1711403618.670578263.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=es.1 (out_id=1)
[2024/03/25 21:53:43] [error] [http_client] broken connection to elasticsearch-master.elastic.svc:9200 ?
[2024/03/25 21:53:43] [ warn] [output:es:es.1] http_do=-1 URI=/_bulk
[2024/03/25 21:53:43] [ warn] [engine] failed to flush chunk '93-1711403618.673907638.flb', retry in 7 seconds: task_id=3, input=tail.0 > output=es.1 (out_id=1)
[2024/03/25 21:53:43] [error] [http_client] broken connection to elasticsearch-master.elastic.svc:9200 ?
[2024/03/25 21:53:43] [ warn] [output:es:es.1] http_do=-1 URI=/_bulk
[2024/03/25 21:53:43] [ warn] [engine] failed to flush chunk '93-1711403618.678663555.flb', retry in 7 seconds: task_id=5, input=tail.0 > output=es.1 (out_id=1)
[2024/03/25 21:53:44] [error] [output:forward:forward.0] no upstream connections available
Not sure why the Fluent bit instance is not able to connect to the fluentd
service for forwarding. I went through some of the DNS troubleshooting stuff here, and that doesn't seem to be having any problems. I almost wonder if fluentd
is not listening on the configured forward input port, for whatever reason. I don't see any kind of log message that might indicate it has started listening (though I haven't dug through the code enough to know whether such a message is expected).
Starting from scratch, and documenting one more thing I had to fix.
The logs from the Kafka operator show
2024-03-26 22:04:24 ERROR AbstractOperator:284 - Reconciliation #4(timer) Kafka(kafka/my-cluster): createOrUpdate failed
io.strimzi.operator.cluster.model.KafkaVersion$UnsupportedKafkaVersionException: Unsupported Kafka.spec.kafka.version: 3.1.0. Supported versions are: [3.6.0, 3.6.1, 3.7.0]
at io.strimzi.operator.cluster.model.KafkaVersion$Lookup.supportedVersion(KafkaVersion.java:180) ~[io.strimzi.cluster-operator-0.40.0.jar:0.40.0]
at io.strimzi.operator.cluster.operator.assembly.ZooKeeperVersionChangeCreator.<init>(ZooKeeperVersionChangeCreator.java:85) ~[io.strimzi.cluster-operator-0.40.0.jar:0.40.0]
So I did kubectl edit -n kafka kafka
to change version
to 3.6.1
and inter.broker.protocol.version
to 3.6
, and then Kafka started successfully.
It seemed that nothing was getting indexed into the Elasticsearch instance at all. The logs showed tons of messages like the following
{"@timestamp":"2024-03-27T00:26:17.260Z", "log.level": "WARN", "message":"received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/10.244.0.26:9200, remoteAddress=/10.244.0.18:49076}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-master-0][transport_worker][T#1]","log.logger":"org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4HttpServerTransport","elasticsearch.cluster.uuid":"ldQm9yJlRhqemKCC9XZDjQ","elasticsearch.node.id":"yajs6batT8GloQL9XCUP2g","elasticsearch.node.name":"elasticsearch-master-0","elasticsearch.cluster.name":"elasticsearch"}
I think that the security settings need to be disabled if we want Fluent pods to be able to send message to it (or else, they need to be configured with security).
I tried kubectl edit statefulset -n elastic elasticsearch-master
to tweak all of the xpack.security.*
settings, to no avail (I'm not able to get the pod to startup again in healthy state after changing them).
OK, at least some of these issues are handled by updating to a much newer version of the Fluent operator. Opened #36 for that.
I ran into the same issues with TLS and credentials. I have my doubts that this repo is maintained, since the instructions don't yield a working configuration and there have been no commits in over two years
Hi all!
I went through the walkthough on my M3 MacBook Pro, and ran into a few issues.
First, when attempting to run the
./create-minikube-cluster-for-mac.sh
, I got this error.It seems that the
hyperkit
driver is not available on Apple Silicon. So I tried starting Minikube without that option (just withminikube start
), and got farther.However, when trying to deploy
fluentd
using this step, the pod failed to start (CrashLoopBackOff
). Checking the logs revealsSo I think that the
fluentd-watcher
image that it's trying to use is incompatible with my architecture. However, although I'm relatively familiar with Kubernetes, I'm not as familiar with Minikube, nor dealing with Apple Silicon related architecture issues. Anyone have pointers on how I might be able to work around this?Thank you!