Large uploads 'fail' but have succeeded

Setup

Hyper-V guest with 8 cores, 12GB RAM
7TB SAS RAID1 drive array
Vanilla Ubuntu 22 Server install
All updates installed
Vanilla mediaCMS 3 as per instructions
Bare minimal config changes to make it useable

I've found many videos, usually ones over 1GB, upload successfully but show as failed, eg: Retry does nothing on this, unlike one that has failed a portion of the way through. If I then go look at the media file, it has uploaded perfectly fine and encoding is finished. On the disk it shows just the one file, aka no spare chunks.

I have tried reducing the chunk sizing;

CHUNKIZE_VIDEO_DURATION = 60 * 2
VIDEO_CHUNKS_DURATION = 60 * 1

But that did not help

I can repro this easily, but am unsure which logs to check.

Narrowed this down to storage speed.

I moved it from 2Gbps SAN to 6Gbps DAS and the issue went away.

I could push/pull ~100MB/s synchronously on this connection, and the uploads are limited to my 40Mbps / 5MB/s throughput, so must be something to do with latency sensitivity, or maybe checksumming?

Tested more and it's not storage, but I was close. It's the upload speed.

If I upload to it from a 40Mbps-up connection, I get this issue. If I upload from a 100Mbps-up connection, it's fine. (the server itself is on gigabit enterprise fibre)

Most services in Australia max out at 40Mbps upload, and that's only if they have fibre or REALLY good copper, so this would likely affect many people.

Okay with even bigger files it still fails on the 100Mbps connection;

'Retry' does nothing. Chrome console logs;

I can copy and paste the file to another location, so read/write is fine, and having tried three times it keeps doing the same thing.

The files on the server are still in a few parts too; root@mediacms:/home/mediacms.io/mediacms/media_files/original/user/admin# ls -Alh *C036*

-rw-r--r-- 1 www-data www-data 2.5G Aug 10 12:33 00_0dwH3A9HG_030e3ca59405463d8d85b8449e84d302.A009_05051034_C036_CC.mov.mkv
-rw-r--r-- 1 www-data www-data 2.5G Aug 10 12:33 01_0dwH3A9HG_030e3ca59405463d8d85b8449e84d302.A009_05051034_C036_CC.mov.mkv
-rw-r--r-- 1 www-data www-data 2.2G Aug 10 12:33 02_0dwH3A9HG_030e3ca59405463d8d85b8449e84d302.A009_05051034_C036_CC.mov.mkv
-rw-r--r-- 1 www-data www-data 2.1G Aug 10 12:33 03_0dwH3A9HG_030e3ca59405463d8d85b8449e84d302.A009_05051034_C036_CC.mov.mkv
-rw-r--r-- 1 www-data www-data 9.0G Aug 10 12:32 030e3ca59405463d8d85b8449e84d302.A009_05051034_C036_CC.mov

I think it's timing out for you. You can try playing with the chunks duration but it's simpler to increase timeouts because fine tuning the chunk duration so that it doesn't timeout is a nightmare. I've successfully uploaded 90 gigabyte files into mediacms and encoded to 240,360,480,and 1080p as .h264

There is currently a hard coded timeout value for the 'short' tasks, and i addressed this in my PR:

https://github.com/mediacms-io/mediacms/pull/856

Also, you can completely eliminate all other timeouts by setting them in your local settings file. For testing, I wanted things to never timeout (don't do this in production).

CELERY_TASK_SOFT_TIME_LIMIT = None
CELERY_TASK_TIME_LIMIT = None
CELERY_SOFT_TIME_LIMIT = None
CELERYD_TASK_SOFT_TIME_LIMIT = None

But you can set time limits to something else because the defaults provided by mediacms are too short.

Also, how are you deploying mediacms at the moment? If you're using nginx and uwsgi, you'll at minimum need to increase timeouts in those places as well

[uwsgi]
http-timeout = 86400    ; Set HTTP timeout to 24 hours
socket-timeout = 86400  ; Set socket timeout to 24 hours

And if you are also using nginx you should configure timeouts and other settings there. I put all of my settings for example in one config file:

nginx-local-settings.conf

proxy_send_timeout 12000;
client_header_timeout 12000;
client_body_timeout 12000;
proxy_request_buffering off;
proxy_read_timeout 12000;
uwsgi_read_timeout 12000;
# client_body_buffer_size 2m;
# proxy_buffering off;
# client_max_body_size 200G;
# proxy_http_version 1.1;
client_max_body_size 0;

Configure things so they work best for you, but i was able to ingest 90 gig files stored on s3 via s3fs. The ingestion from upload to encoding takes about 4 hours for me at the moment, which is unbelievably slow but it does complete. I am looking to optimize this but I'm just sharing this with you because I suspect you are encountering timeout issues with the larger files. I might be able to optimize this by experimenting with the s3fs cache directory options without having to touch mediacms more.

Also, the error message in the UI ideally should provide more context into why things are failing. It took a lot of debugging and trial and error for me to ingest large files.

Oh nice one - I'll grab that PR and change the configs. Many thanks!

Hi, thanks for the report and for the insightful comments @tobocop2 and @platima .

I think that the sugestions for uwsgi are valid, there might be something that has to be optimized in this case. And I also believe that the CELERY settings are irrelevant (going to comment on the PR as well), because there's different uses for the word chunkize here and we are getting confused:

Chrome console logs Problem finalizing chunks is related with the fine uploader and Django integration, basically fine uploader (a javascript uploader library) is making chunks of the original file and sending to the Django backend, and something time outs or fails there - this is most probably related with the limits @tobocop2 mentions. As a side note, I want to get rid of fine uploader very soon, because it's deprecated software (despite being great and being used so widely until now), there's a ticket discussion a migration to uppy.io for this
Celery runs a task that chunkizes the original file, so that it gets to be transcoded by ffmpeg through different available CPU. A file of 1G is split in many chunks, so that different workers can run ffmpeg and transcode each chunk, and finally all chunks are concatenated. @tobocop2 the task that chunkizes the file should be very fast, it's just splitting the file in smaller parts...

As a recap, the problems here seem to be related with the uploading and not post-processing of files.

On a sidenote, @tobocop2 I am curious to learn what could be a 90 gigabyte file that you've uploaded! I haven't tested the software in so big files, but would be very interested to learn more for this case, what type of videos/workflows you are having here, and what is the infrastructure that processess it. The software is definitely not optimized for that type of videos but I'm happy to read it doesn't fail. One thing I can think of is the command that produces the sprites file (the small images when you hover on the video duration bar), I know that this command fails on a vanilla MediaCMS for videos larger than 1-2 hours, and needs a tweak on the command that produces it (something related to ImageMagick if I remember well). For sure there will be other issues or edge cases here because again the software is not tested on so big files.

Regards

@mgogoulos the chunk jobs were all failing for me. You have 300 seconds as the hard coded timeout in the supervisord file and the tasks file.

The order of precedence is

keyword arguments at command line arguments in decorator global arguments.

Right now there is no way to override the timeout because it's hard coded hence i made https://github.com/mediacms-io/mediacms/pull/856

Regarding the sprites, I overwrote the imagemagick policy.xml file and set it to be like this

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
  <!ELEMENT policymap (policy)*>
  <!ATTLIST policymap xmlns CDATA #FIXED ''>
  <!ELEMENT policy EMPTY>
  <!ATTLIST policy xmlns CDATA #FIXED '' domain NMTOKEN #REQUIRED
    name NMTOKEN #IMPLIED pattern CDATA #IMPLIED rights NMTOKEN #IMPLIED
    stealth NMTOKEN #IMPLIED value CDATA #IMPLIED>
]>
<!--
  Configure ImageMagick policies.

  Domains include system, delegate, coder, filter, path, or resource.

  Rights include none, read, write, execute and all.  Use | to combine them,
  for example: "read | write" to permit read from, or write to, a path.

  Use a glob expression as a pattern.

  Suppose we do not want users to process MPEG video images:

    <policy domain="delegate" rights="none" pattern="mpeg:decode" />

  Here we do not want users reading images from HTTP:

    <policy domain="coder" rights="none" pattern="HTTP" />

  The /repository file system is restricted to read only.  We use a glob
  expression to match all paths that start with /repository:

    <policy domain="path" rights="read" pattern="/repository/*" />

  Lets prevent users from executing any image filters:

    <policy domain="filter" rights="none" pattern="*" />

  Any large image is cached to disk rather than memory:

    <policy domain="resource" name="area" value="1GP"/>

  Use the default system font unless overwridden by the application:

    <policy domain="system" name="font" value="/usr/share/fonts/favorite.ttf"/>

  Define arguments for the memory, map, area, width, height and disk resources
  with SI prefixes (.e.g 100MB).  In addition, resource policies are maximums
  for each instance of ImageMagick (e.g. policy memory limit 1GB, -limit 2GB
  exceeds policy maximum so memory limit is 1GB).

  Rules are processed in order.  Here we want to restrict ImageMagick to only
  read or write a small subset of proven web-safe image types:

    <policy domain="delegate" rights="none" pattern="*" />
    <policy domain="filter" rights="none" pattern="*" />
    <policy domain="coder" rights="none" pattern="*" />
    <policy domain="coder" rights="read|write" pattern="{GIF,JPEG,PNG,WEBP}" />
-->
<policymap>
  <!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
  <policy domain="resource" name="memory" value="1GiB"/>
  <policy domain="resource" name="map" value="30GiB"/>
  <policy domain="resource" name="width" value="16MP"/>
  <policy domain="resource" name="height" value="16MP"/>
  <!-- <policy domain="resource" name="list-length" value="128"/> -->
  <policy domain="resource" name="area" value="40GP"/>
  <policy domain="resource" name="disk" value="100GiB"/>
  <!-- <policy domain="resource" name="file" value="768"/> -->
  <!-- <policy domain="resource" name="thread" value="4"/> -->
  <!-- <policy domain="resource" name="throttle" value="0"/> -->
  <!-- <policy domain="resource" name="time" value="3600"/> -->
  <!-- <policy domain="coder" rights="none" pattern="MVG" /> -->
  <!-- <policy domain="module" rights="none" pattern="{PS,PDF,XPS}" /> -->
  <!-- <policy domain="path" rights="none" pattern="@*" /> -->
  <!-- <policy domain="cache" name="memory-map" value="anonymous"/> -->
  <!-- <policy domain="cache" name="synchronize" value="True"/> -->
  <!-- <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/>
  <!-- <policy domain="system" name="max-memory-request" value="256MiB"/> -->
  <!-- <policy domain="system" name="shred" value="2"/> -->
  <!-- <policy domain="system" name="precision" value="6"/> -->
  <!-- <policy domain="system" name="font" value="/path/to/font.ttf"/> -->
  <!-- <policy domain="system" name="pixel-cache-memory" value="anonymous"/> -->
  <!-- <policy domain="system" name="shred" value="2"/> -->
  <!-- <policy domain="system" name="precision" value="6"/> -->
  <!-- not needed due to the need to use explicitly by mvg: -->
  <!-- <policy domain="delegate" rights="none" pattern="MVG" /> -->
  <!-- use curl -->
  <policy domain="delegate" rights="none" pattern="URL" />
  <policy domain="delegate" rights="none" pattern="HTTPS" />
  <policy domain="delegate" rights="none" pattern="HTTP" />
  <!-- in order to avoid to get image with password text -->
  <policy domain="path" rights="none" pattern="@*"/>
  <!-- disable ghostscript format types -->
  <policy domain="coder" rights="none" pattern="PS" />
  <policy domain="coder" rights="none" pattern="PS2" />
  <policy domain="coder" rights="none" pattern="PS3" />
  <policy domain="coder" rights="none" pattern="EPS" />
  <policy domain="coder" rights="none" pattern="PDF" />
  <policy domain="coder" rights="none" pattern="XPS" />
</policymap>

I'm able to ingest 90 gig files no problem using the basic docker-compose infrastructure you provided with the addition of s3fs added to my docker-compose. Here is my config.

version: "3"

# Uses https://github.com/nginx-proxy/acme-companion

services:
  nginx-proxy:
    user: "${UID}:${GID}"
    image: nginxproxy/nginx-proxy
    container_name: nginx-proxy
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - vhost:/etc/nginx/vhost.d
      - html:/usr/share/nginx/html
      - dhparam:/etc/nginx/dhparam
      - certs:/etc/nginx/certs:ro
      - /var/run/docker.sock:/tmp/docker.sock:ro
      - ./deploy/docker/reverse_proxy/nginx:/etc/nginx/conf.d
    restart: always

  acme-companion:
    user: "${UID}:${GID}"
    image: nginxproxy/acme-companion
    container_name: nginx-proxy-acme
    volumes_from:
      - nginx-proxy
    volumes:
      - certs:/etc/nginx/certs:rw
      - acme:/etc/acme.sh
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: always

  migrations:
    user: "${UID}:${GID}"
    image: mediacms/mediacms:latest
    volumes:
      - ./:/home/mediacms.io/mediacms/
    env_file:
      - .env_file  # Load environment variables from the .env_file
    environment:
      ENABLE_UWSGI: 'no'
      ENABLE_NGINX: 'no'
      ENABLE_CELERY_SHORT: 'no'
      ENABLE_CELERY_LONG: 'no'
      ENABLE_CELERY_BEAT: 'no'
    command: "./deploy/docker/prestart.sh"
    restart: on-failure
    depends_on:
      redis:
        condition: service_healthy
      db:
        condition: service_healthy
    restart: always
  web:
    user: "${UID}:${GID}"
    build:
      context: .
    image: mediacms/mediacms:latest
    deploy:
      replicas: 1
    volumes:
      - ./:/home/mediacms.io/mediacms/
      - ./deploy/docker/reverse_proxy/web:/etc/nginx/conf.d
    env_file:
      - .env_file  # Load environment variables from the .env_file
    environment:
      # VIRTUAL_PROTO: uwsgi
      # ENABLE_UWSGI: 'no'
      ENABLE_CELERY_BEAT: 'no'
      ENABLE_CELERY_SHORT: 'no'
      ENABLE_CELERY_LONG: 'no'
      ENABLE_MIGRATIONS: 'no'
    depends_on:
      - migrations
      - s3fs
    restart: always
  celery_beat:
    user: "${UID}:${GID}"
    image: mediacms/mediacms:latest
    volumes:
      - ./:/home/mediacms.io/mediacms/
      - ./deploy/docker/reverse_proxy/client_max_body_size.conf:/etc/nginx/conf.d/client_max_body_size.conf:ro
    environment:
      ENABLE_UWSGI: 'no'
      ENABLE_NGINX: 'no'
      ENABLE_CELERY_SHORT: 'no'
      ENABLE_CELERY_LONG: 'no'
      ENABLE_MIGRATIONS: 'no'
    depends_on:
      - redis
    restart: always
  celery_worker:
    user: "${UID}:${GID}"
    image: mediacms/mediacms:latest
    deploy:
      replicas: 1
    volumes:
      - ./:/home/mediacms.io/mediacms/
    environment:
      ENABLE_UWSGI: 'no'
      ENABLE_NGINX: 'no'
      ENABLE_CELERY_BEAT: 'no'
      ENABLE_MIGRATIONS: 'no'
    depends_on:
      - migrations
      - s3fs
    restart: always
  db:
    user: "${UID}:${GID}"
    image: postgres:13
    volumes:
      - ../postgres_data:/var/lib/postgresql/data/
    restart: always
    environment:
      POSTGRES_USER: mediacms
      POSTGRES_PASSWORD: mediacms
      POSTGRES_DB: mediacms
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U mediacms"]
      interval: 30s
      timeout: 10s
      retries: 5
  redis:
    user: "${UID}:${GID}"
    image: "redis:alpine"
    restart: always
    healthcheck:
      test: ["CMD", "redis-cli","ping"]
      interval: 30s
      timeout: 10s
      retries: 3
  s3fs:
    user: "${UID}:${GID}"
    devices:
      - "/dev/fuse:/dev/fuse"
    cap_add:
      - SYS_ADMIN
    security_opt:
      - apparmor:unconfined
    image: "efrecon/s3fs:1.93"
    restart: always
    env_file:
      - .env_file  # Load environment variables from the .env_file
    environment:
      # uncomment for debug info
      # S3FS_DEBUG: 1
      # S3FS_ARGS: allow_other,use_rrs,umask=0000,nonempty,max_stat_cache_size=100000000,stat_cache_expire=10800,readwrite_timeout=10800,connect_timeout=10800,parallel_count=18,use_cache=/tmp/s3fs
      S3FS_ARGS: allow_other,use_rrs,umask=0000,nonempty,max_stat_cache_size=100000000,readwrite_timeout=10800,connect_timeout=10800,parallel_count=18,use_cache=/tmp/s3fs
      # S3FS_ARGS: allow_other,use_rrs,umask=0000,nonempty
    volumes:
      - ./media_files:/opt/s3fs/bucket:rshared
      - ./s3fs-cache:/tmp/s3fs

  s3fs-cron:
    image: alpine
    command: crond -f
    volumes:
      - ./media_files:/opt/s3fs/bucket:rshared
      - ./s3fs-cache:/tmp/s3fs
      - ./clear_cache.sh:/etc/periodic/hourly/clear_cache
    environment:
      - "SHELL=/bin/sh"
      - "CRON_STR=*/3 * * * * run-parts /etc/periodic/hourly"
    entrypoint:
      - /bin/sh
      - -c
      - |
          echo "$$CRON_STR" > /etc/crontabs/root && exec crond -l 2 -f
    restart: always

volumes:
  conf:
  vhost:
  html:
  dhparam:
  certs:
  acme:

This is just a POC but it works for 90 gig files given all my modifications to the uwsgi, imagemagick, nginx, and django confs.

This is my local settings file


FRONTEND_HOST = 'http://localhost'
PORTAL_NAME = 'MediaCMS'

POSTGRES_HOST = 'db'
REDIS_LOCATION = "redis://redis:6379/1"

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "mediacms",
        "HOST": POSTGRES_HOST,
        "PORT": "5432",
        "USER": "mediacms",
        "PASSWORD": "mediacms",
    }
}

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": REDIS_LOCATION,
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
    }
}

# CELERY STUFF
BROKER_URL = REDIS_LOCATION
CELERY_RESULT_BACKEND = BROKER_URL

# we can avoid creation of the hls files this way
MP4HLS_COMMAND = 'IGNORE HLS CONVERSION'

DEBUG = True

PORTAL_WORKFLOW = 'private'

UPLOAD_MAX_SIZE = 200 * 1024 * 1024 * 1024  # 214,748,364,800 bytes (200 gigabytes)
REGISTER_ALLOWED = False
MINIMUM_RESOLUTIONS_TO_ENCODE = [720]
CELERY_TASK_SOFT_TIME_LIMIT = None
CELERY_TASK_TIME_LIMIT = None
CELERY_SOFT_TIME_LIMIT = None
CELERYD_TASK_SOFT_TIME_LIMIT = None
EMAIL_BACKEND = 'django.core.mail.backends.dummy.EmailBackend'
MEDIA_URL = 'SOME+CLOUD_FRONT_URL'

Note i disabled the hls conversion because i don't need it.

For convenience i also wrapped all the docker-compose commands in Makefile

SHELL = /bin/sh

UID := $(shell id -u)
GID := $(shell id -g)

export UID
export GID

# Path to the mediacms directory after cloning
MEDIACMS_DIR := mediacms

# Make 'build' target a default goal
.DEFAULT_GOAL := build

# 'build' target will clone 'mediacms' repository, copy files, and run 'up'
.PHONY: build
build: copy-files up

# Rule for cloning 'mediacms' repository
.PHONY: clone
clone:
>---if [ ! -d "$(MEDIACMS_DIR)" ]; then git clone https://github.com/mediacms-io/mediacms.git $(MEDIACMS_DIR); fi

# Rule for copying the compose file, .env_file, and nginx conf file
copy-files:
>---# Copy 'docker-compose-letsencrypt-s3.yaml', '.env_file', and 'client_max_body_size.conf' to 'mediacms' directory
>---cp docker-compose-letsencrypt-s3.yaml .env_file requirements.txt clear_cache.sh  $(MEDIACMS_DIR)/
>---# Copy 'client_max_body_size.conf' to the reverse_proxy directory
>---mkdir -p $(MEDIACMS_DIR)/deploy/docker/reverse_proxy/web
>---mkdir -p $(MEDIACMS_DIR)/deploy/docker/reverse_proxy/nginx
>---cp web/client_max_body_size.conf $(MEDIACMS_DIR)/deploy/docker/reverse_proxy/web/
>---cp nginx/client_max_body_size.conf $(MEDIACMS_DIR)/deploy/docker/reverse_proxy/nginx/
>---cp local_settings.py nginx.conf uwsgi.ini nginx_http_only.conf policy.xml $(MEDIACMS_DIR)/deploy/docker/
>---cp tasks.py $(MEDIACMS_DIR)/files/
>---cp local_settings.py $(MEDIACMS_DIR)/cms/
>---cp supervisord-celery_short.conf $(MEDIACMS_DIR)/deploy/docker/supervisord/

# Targets for managing the Docker Compose setup
.PHONY: up
up:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml up --build -d

.PHONY: down
down:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml down --remove-orphans
>---umount $(MEDIACMS_DIR)/media_files

logs_nginx:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --tail=1000 --follow nginx-proxy

logs_web:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --tail=1000 --follow web

logs_db:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --tail=1000 --follow db

logs_redis:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --tail=1000 --follow redis

logs_s3fs:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 s3fs

logs_celery_worker:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 celery_worker

logs_celery_beat:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 celery_beat

logs_s3fs_cron:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 s3fs-cron

logs_acme-companion:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 acme-companion

logs_migrations:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml logs --follow --tail=1000 migrations

web_sh:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml exec web /bin/bash

celery_sh:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml exec celery_worker /bin/bash

nginx_sh:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml exec nginx-proxy /bin/bash

s3fs_sh:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml exec s3fs  /bin/sh

s3fs_cron_sh:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml exec s3fs-cron  /bin/sh

restart_web:
>---cd $(MEDIACMS_DIR) && docker-compose -f docker-compose-letsencrypt-s3.yaml restart web

Always happy to help, and thanks for commenting on this ticket. Looking forward to a better uploader then :P

I'm quite regularly running into this issue as well. Thanks for the suggestions, everyone! Will try some of this and see if it helps. It would be great if some of this information was put into the admin documentation.

This ticket stays open until there is some info of it moved to the admin docs. @tobocop2 I'm wondering which parts could go into the main repository, to optimize the default timeouts/settings.

Next, how is s3fs performance in playing these videos? Any observations?

And also out of curiosity, what are these 90GB files, what sector are you working on?

@mgogoulos

I completely abandoned s3fs for my needs. The ingestion throughput was poor and unsuitable for my needs and i ultimately devised a solution that simply uses aws s3 sync via a cronjob. I am going to migrate this to use aws datasync eventually. I was getting single stream uploads to s3 at roughly 250mb/s and I was doing this across multiple workers, so I was seeing over 1TB/s throughput via multiple s3 sync commands. I think my throughput needs just are too demanding for s3fs so I had to abandon it. It was a very interesting recommendation but just not suitable for the scale and file sizes I'm dealing with. I possibly could have spent the time to tune s3fs since it has a wide variety of options. I tried experimenting with the cache option, but ultimately i was just not seeing my files ingested in reasonable amounts of time so i moved away from it. While s3fs is a really awesome tool, i found the simpler approach of just maintaining a cron job that runs the aws s3 sync command hourly to be a more sustainable and manageable solution.

The 90 gig files are are broadcast media files (mxf format, i had to add my own mimetype for them to work with mediacms).

For playback, I'm using the MEDIA_URL setting in django and serving all of my files from s3 over cloudfront. This is breaking static assets in some places but I am ok with that for the time being. I see an existing PR here: https://github.com/mediacms-io/mediacms/pull/869/files

which should fix that issue for me.

In order to actually support the throughput and scale I needed, I had to deploy mediacms in two configurations:

A) ingest mode (EC2 + EBS (GP3 max IOPS and max throughput) + local redis + shared RDS postgres B) basic mode (ECS + EFS + Elasticache + RDS (shared with ingest)

Ingest is on demand, so year round I'll only have the simple ECS/EFS configuration deployed. In my ingest configuration I have one worker running on the same machine as the web app and I give the web app roughly 30% of my CPU. Distributed workers would be great, but the throughput needs that I have are unreasonable for NFS. Encoding was taking unreasonable amounts of time when reading from NFS and uploading was similarly slow.

I have successfully ingested over 60 terrabytes of media through mediacms with some modifications to the deployment configurations and my own aws infrastructure code using terraform. Thank you as well as all of the contributors to this project @mgogoulos

mxf format

thanks for the very interesting information and insights, on the usage of s3fs, customizations and your use case. I'll try to review the PRs and merge them soon.

I'd like to note that I have not found any of the suggestions here to work for me. I still get this problem when large files are uploaded. The uploader will show the upload as failed, the file will not appear in the user's media list, but then a while later the file will show up anyway. Have not had the time to trace it down. I've just told users who upload huge files that it's a known issue, and if it says failed it probably worked so they will need to just check back later.

mediacms-io / mediacms

Large uploads 'fail' but have succeeded #852