Open TheRealMattLear opened 1 month ago
We've also increased the CSI driver controller to 3 replicas to avoid a SPOF.
What is the setup?
For the empty files, does it have content from the filer?
Using the default deploy/helm/seaweedfs-csi-driver/values.yaml with the only change:
controller:
replicas: 3
seaweedfs helm values is fairly stock outside of the following changes (literal git diffs)
replicas: 3
, defaultReplication: 001
replicas 3
, index: leveldb
dataDirs:
- name: data
type: "emptyDir"
maxVolumes: 2 # Each volume is 30 GB, Linode 8GB has 160GB total and we want to use only 60 GB per node (2 volumes x 30 GB) leaving space for local file (tv + fallback cache)
- name: data1
type: "persistentVolumeClaim"
storageClass: "linode-block-storage-retain"
size: "60Gi"
maxVolumes: 0
replicas: 3
data
data:
type: "hostPath"
size: ""
storageClass: ""
hostPathPrefix: /storage
Has been working for a solid week, we did have one complaint that may have been related. Today at peak operation, lots of read/writes, catastrophic failure progressively worsening over the space of an hour from 1/10 to 9/10 writes ending up with 0 byte file before we switched to our previous nfs solution. Previous nfs is just one pod in the same node group operating as an nfs, seems to be handling the load and pressure fine.
Gut feeling is replication couldn't keep up with the changes, or writes being rejected for some reason? but i'm pretty new to seaweedfs. I believe FFMPEG is constantly writing to a mono.m3u8.tmp file and then renaming it to mono.m3u8, replacing the file - snippet of the filer log here:
I0713 23:31:25.877303 filer_grpc_server_rename.go:53 StreamRenameEntry old_directory:"/buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8" old_name:"mono.m3u8.tmp" new_directory:"/buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8" new_name:"mono.m3u8" signatures:-1405740923
I0713 23:31:25.877369 filer_grpc_server_rename.go:157 moving entry /buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8/mono.m3u8.tmp => /buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8/mono.m3u8
As for the .ts files, its new files created with sequential numbering 001.ts, 002.ts, etc.
For the empty files, does it have content from the filer?
Sorry i'm not quite sure what you mean. Infrastructure is all up and running but production is switched back to the previous nfs, so i can check any logs etc., just may need a bit of guidance if possible :)
My next step was going to look at replacing the filer index with a distributed filer.
This could be a metadata synchronization problem. Try to use one csi driver and one filer to see whether this can be reproduced.
1 csi-driver => 1 filer => (master+volumes)
I've been able to replicate this consistently with 1 csi-driver and 1 filer, as well as multiple csi-drivers and filers. There are 4 volume replicas and 3 master replicas. Creating a deployment with ~80 pods running ffmpeg with hls output to /data is enough. I did even see the problem with 40 pods, just not as frequent.
Dockerfile:
FROM ubuntu:latest
WORKDIR /root
RUN sed -i -e 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/ubuntu\.mirror\.serversaustralia\.com\.au\/ubuntu\//' /etc/apt/sources.list \
&& sed -i -e 's/http:\/\/security\.ubuntu\.com\/ubuntu\//http:\/\/ubuntu\.mirror\.serversaustralia\.com\.au\/ubuntu\//' /etc/apt/sources.list \
&& apt-get update && apt-get install -y --no-install-recommends ffmpeg;
RUN apt-get -y install curl;
RUN curl "https://drive.usercontent.google.com/download?id=1UPaMAQtTnTOd5lFq0wiwVuuotN4iYM2K&export=download&authuser=0" --output /root/3min_tester48.mp4;
COPY --chmod=755 ./start.sh /root/start.sh
CMD ["/root/start.sh"]
start.sh
#!/bin/bash
host=`hostname`;
mkdir -p /data/${host};
ffmpeg -re -stream_loop -1 -i /root/3min_tester48.mp4 -codec copy -b:v 1500k -b:a 128k -map 0:v:0 -map 0:a:0 \
-hls_init_time 2 -hls_time 4 -hls_list_size 20 -hls_flags delete_segments -var_stream_map 'v:0,a:0' \
-master_pl_name index.m3u8 -http_user_agent Akamai_Broadcaster_v1.0 -http_persistent 1 \
-f hls /data/${host}/mono.m3u8;
After a couple minutes running ls -l /data/*/ | grep ' 0 '
and we start seeing 0 bytes files. Notably, i would expect the latest .ts file written to start at zero as it seems ffmpeg might open the file and then append data down the line, however previously created files should not be zero, and the m3u8 files i would expect to not be zero also. Here's one example in the output where i can easily spot mono18.ts:
-rw-r--r-- 1 root root 0 Jul 17 05:26 /data/nfstest-79dbb7cc99-tzssp/mono18.ts
-rw-r--r-- 1 root root 0 Jul 17 05:26 /data/nfstest-79dbb7cc99-tzssp/mono24.ts
Running this same scenario on https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner and the only time i've ever seen a zero byte file was a "mono.m3u8.tmp" file which i must have caught right at the second the file was created.
How many weed mount
processes are running in parallel?
I am confused by 1 csi-driver
and 40 pods
.
Sorry for any confusion, using the default 1 controller: replicas on the seaweedfs-csi-driver helm values file.
Running 40+ test pods created to write files via ffmpeg using the Dockerfile supplied above in simple deployment across ~8 nodes:
pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: seaweedfs-hls
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 75Gi # This value is enforced after 2022/03/18
storageClassName: seaweedfs-storage
Deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfstest
spec:
replicas: 1
selector:
matchLabels:
run: nfstest
template:
metadata:
labels:
run: nfstest
spec:
containers:
- name: nfstest
image: nfstest:latest
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: hls
mountPath: "/data"
volumes:
- name: hls
persistentVolumeClaim:
claimName: seaweedfs-hls
How many
weed mount
processes are running in parallel?
Still the same question.
Sorry I'm not sure how to check this?
basically how many csi driver programs are running.
Hi,
I have seaweedfs-csi-driver configured with seaweedfs on kubernetes with default seaweedfs values outside of replicas and volumes. 3 masters with 001 replication configured, 3 filers, 4 volume replicas. We've also increased the CSI driver controller to 3 replicas to avoid a SPOF. We have 8 application pods running a total of ~100 ffmpeg processes streaming live hls content. For each process a new .ts file written with the stream data every 2 seconds, and a master.m3u8 file is updated every 2 seconds. For every new .ts data file that is written, one is deleted. ( a constant stream of changing data ).
When accessing the master.hls file we're finding that it is randomly empty with 0 bytes, and some of the .ts files also 0 bytes and others fine. For example:
Sometimes every file is zero bytes, sometimes none are, sometimes only some files are. It's not consistent. I'm also finding some larger once-off writes for entire mp4 files at ~1-5GB are empty.
Log files on all pods look normal, no visible errors that i could see.
We're migrating over from nfs-ganesha-server-and-external-provisioner due to it being a SPOF, the previous solution was fine without issue. The only change is using seaweedfs instead.
We tried doubling filer replicas, and even decreasing down to 1; to no avail.
I'm wondering if it could have something to do with concurrentWriters default of 32?
Any thoughts as to where to look to solve this?