Files written with zero bytes

TheRealMattLear commented 1 month ago

Hi,

I have seaweedfs-csi-driver configured with seaweedfs on kubernetes with default seaweedfs values outside of replicas and volumes. 3 masters with 001 replication configured, 3 filers, 4 volume replicas. We've also increased the CSI driver controller to 3 replicas to avoid a SPOF. We have 8 application pods running a total of ~100 ffmpeg processes streaming live hls content. For each process a new .ts file written with the stream data every 2 seconds, and a master.m3u8 file is updated every 2 seconds. For every new .ts data file that is written, one is deleted. ( a constant stream of changing data ).

When accessing the master.hls file we're finding that it is randomly empty with 0 bytes, and some of the .ts files also 0 bytes and others fine. For example:

root@xxxxx-ingest-dc778486b-hbxdc:/var/www/html# ls -l /mnt/hls/etyetihk/1720479791/
total 3268
-rw-r--r-- 1 sail sail      0 Jul 13 23:21 stream0_101573.ts
-rw-r--r-- 1 sail sail      0 Jul 13 23:21 stream0_101574.ts
-rw-r--r-- 1 sail sail      0 Jul 13 23:21 stream0_101575.ts
-rw-r--r-- 1 sail sail 835284 Jul 13 23:21 stream0_101576.ts
-rw-r--r-- 1 sail sail 831524 Jul 13 23:21 stream0_101577.ts
-rw-r--r-- 1 sail sail 833216 Jul 13 23:22 stream0_101578.ts
-rw-r--r-- 1 sail sail 844684 Jul 13 23:22 stream0_101579.ts

Sometimes every file is zero bytes, sometimes none are, sometimes only some files are. It's not consistent. I'm also finding some larger once-off writes for entire mp4 files at ~1-5GB are empty.

Log files on all pods look normal, no visible errors that i could see.

We're migrating over from nfs-ganesha-server-and-external-provisioner due to it being a SPOF, the previous solution was fine without issue. The only change is using seaweedfs instead.

We tried doubling filer replicas, and even decreasing down to 1; to no avail.

I'm wondering if it could have something to do with concurrentWriters default of 32?

Any thoughts as to where to look to solve this?

chrislusf commented 1 month ago

We've also increased the CSI driver controller to 3 replicas to avoid a SPOF.

What is the setup?

For the empty files, does it have content from the filer?

TheRealMattLear commented 1 month ago

Using the default deploy/helm/seaweedfs-csi-driver/values.yaml with the only change:

controller:
  replicas: 3

seaweedfs helm values is fairly stock outside of the following changes (literal git diffs)

master replicas: 3, defaultReplication: 001
volume replicas 3, index: leveldb

volume dataDirs:

dataDirs:
- name: data
  type: "emptyDir"
  maxVolumes: 2  # Each volume is 30 GB, Linode 8GB has 160GB total and we want to use only 60 GB per node (2 volumes x 30 GB) leaving space for local file (tv + fallback cache)
- name: data1
  type: "persistentVolumeClaim"
  storageClass: "linode-block-storage-retain"
  size: "60Gi"
  maxVolumes: 0

filer replicas: 3

filer data

data:
type: "hostPath"
size: ""
storageClass: ""
hostPathPrefix: /storage

Has been working for a solid week, we did have one complaint that may have been related. Today at peak operation, lots of read/writes, catastrophic failure progressively worsening over the space of an hour from 1/10 to 9/10 writes ending up with 0 byte file before we switched to our previous nfs solution. Previous nfs is just one pod in the same node group operating as an nfs, seems to be handling the load and pressure fine.

Gut feeling is replication couldn't keep up with the changes, or writes being rejected for some reason? but i'm pretty new to seaweedfs. I believe FFMPEG is constantly writing to a mono.m3u8.tmp file and then renaming it to mono.m3u8, replacing the file - snippet of the filer log here:

I0713 23:31:25.877303 filer_grpc_server_rename.go:53 StreamRenameEntry old_directory:"/buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8"  old_name:"mono.m3u8.tmp"  new_directory:"/buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8"  new_name:"mono.m3u8"  signatures:-1405740923
I0713 23:31:25.877369 filer_grpc_server_rename.go:157 moving entry /buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8/mono.m3u8.tmp => /buckets/pvc-c23f5b5c-0c50-474f-a9c7-94c496e735c2/vdv92rk8/mono.m3u8

As for the .ts files, its new files created with sequential numbering 001.ts, 002.ts, etc.

For the empty files, does it have content from the filer?

Sorry i'm not quite sure what you mean. Infrastructure is all up and running but production is switched back to the previous nfs, so i can check any logs etc., just may need a bit of guidance if possible :)

My next step was going to look at replacing the filer index with a distributed filer.

chrislusf commented 1 month ago

This could be a metadata synchronization problem. Try to use one csi driver and one filer to see whether this can be reproduced.

1 csi-driver => 1 filer => (master+volumes)

TheRealMattLear commented 1 month ago

I've been able to replicate this consistently with 1 csi-driver and 1 filer, as well as multiple csi-drivers and filers. There are 4 volume replicas and 3 master replicas. Creating a deployment with ~80 pods running ffmpeg with hls output to /data is enough. I did even see the problem with 40 pods, just not as frequent.

Dockerfile:

FROM ubuntu:latest
WORKDIR /root
RUN    sed -i -e 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/ubuntu\.mirror\.serversaustralia\.com\.au\/ubuntu\//' /etc/apt/sources.list \
    && sed -i -e 's/http:\/\/security\.ubuntu\.com\/ubuntu\//http:\/\/ubuntu\.mirror\.serversaustralia\.com\.au\/ubuntu\//' /etc/apt/sources.list \
    && apt-get update && apt-get install -y --no-install-recommends ffmpeg;
RUN apt-get -y install curl;
RUN curl "https://drive.usercontent.google.com/download?id=1UPaMAQtTnTOd5lFq0wiwVuuotN4iYM2K&export=download&authuser=0" --output /root/3min_tester48.mp4;

COPY --chmod=755 ./start.sh /root/start.sh
CMD ["/root/start.sh"]

start.sh

#!/bin/bash
host=`hostname`;
mkdir -p /data/${host};
ffmpeg -re -stream_loop -1 -i /root/3min_tester48.mp4 -codec copy -b:v 1500k -b:a 128k -map 0:v:0 -map 0:a:0 \
 -hls_init_time 2 -hls_time 4 -hls_list_size 20 -hls_flags delete_segments -var_stream_map 'v:0,a:0' \
 -master_pl_name index.m3u8 -http_user_agent Akamai_Broadcaster_v1.0 -http_persistent 1 \
 -f hls /data/${host}/mono.m3u8;

After a couple minutes running ls -l /data/*/ | grep ' 0 ' and we start seeing 0 bytes files. Notably, i would expect the latest .ts file written to start at zero as it seems ffmpeg might open the file and then append data down the line, however previously created files should not be zero, and the m3u8 files i would expect to not be zero also. Here's one example in the output where i can easily spot mono18.ts:

-rw-r--r-- 1 root root       0 Jul 17 05:26 /data/nfstest-79dbb7cc99-tzssp/mono18.ts
-rw-r--r-- 1 root root       0 Jul 17 05:26 /data/nfstest-79dbb7cc99-tzssp/mono24.ts

Running this same scenario on https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner and the only time i've ever seen a zero byte file was a "mono.m3u8.tmp" file which i must have caught right at the second the file was created.

chrislusf commented 1 month ago

How many weed mount processes are running in parallel? I am confused by 1 csi-driver and 40 pods.

TheRealMattLear commented 1 month ago

Sorry for any confusion, using the default 1 controller: replicas on the seaweedfs-csi-driver helm values file.

Running 40+ test pods created to write files via ffmpeg using the Dockerfile supplied above in simple deployment across ~8 nodes:

pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: seaweedfs-hls
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 75Gi # This value is enforced after 2022/03/18
  storageClassName: seaweedfs-storage

Deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfstest
spec:
  replicas: 1
  selector:
    matchLabels:
      run: nfstest
  template:
    metadata:
      labels:
        run: nfstest
    spec:
      containers:
        - name: nfstest
          image: nfstest:latest
          env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          volumeMounts:
            - name: hls
              mountPath: "/data"
      volumes:
        - name: hls
          persistentVolumeClaim:
            claimName: seaweedfs-hls

chrislusf commented 1 month ago

How many weed mount processes are running in parallel?

Still the same question.

TheRealMattLear commented 1 month ago

Sorry I'm not sure how to check this?

chrislusf commented 1 month ago

basically how many csi driver programs are running.

seaweedfs / seaweedfs-csi-driver

Files written with zero bytes #172