mediacms-io / mediacms

MediaCMS is a modern, fully featured open source video and media CMS, written in Python/Django and React, featuring a REST API.
https://mediacms.io
GNU Affero General Public License v3.0
2.52k stars 458 forks source link

create_hls being called multiple times on the same video as additional MP4 encodings finish #962

Open KyleMaas opened 5 months ago

KyleMaas commented 5 months ago

Describe the issue Related to #929 and pull request #938. Part of the problem appears to be coming from how create_hls is written and is called, in that it appears to be running on the same video ID every time any MP4 encoding finishes. Which means if you have 4-5 different resolutions, create_hls appears to be intentionally running 4-5 times. When you have huge videos, this results in atrocious amounts of disk I/O. We need to figure out a way to get create_hls to only run after the very last MP4 encoding has finished.

See here for reference, walking up the call stack to where this is coming from:

https://github.com/mediacms-io/mediacms/blob/c5047d8df8686d75100e5099489be4fd1bf5f733/files/tasks.py#L408

https://github.com/mediacms-io/mediacms/blob/c5047d8df8686d75100e5099489be4fd1bf5f733/files/models.py#L642

https://github.com/mediacms-io/mediacms/blob/c5047d8df8686d75100e5099489be4fd1bf5f733/files/models.py#L1564

If you note in that last one, that's ideally not being done per chunk (which is what #938 should fix) but is still done on every successful encoding, which means every enabled MP4 encoding profile should ultimately result in a call to create_hls if the encoding is successful.

To Reproduce Steps to reproduce the issue:

  1. Make sure your Celery is configured to allow for more concurrent tasks than you have MP4 encoding profiles so that this can all happen concurrently.
  2. Upload a video that takes at least an hour to transcode, like a high-resolution video which is several hours long.
  3. Watch your tasks in ps or top. You should see Bento running multiple times, and if your processing is long enough or your disk access slow enough, you should eventually see those Bento processes finish and the cp start building up and colliding.

Expected behavior create_hls should only run once per Media record, when all the MP4 encoding processes for that video have finished.

Screenshots N/A

Environment (please complete the following information):

Additional context N/A

mgogoulos commented 5 months ago

The description is valid, this was made on purpose neglecting the fact that it adds extra overhead on large videos (I hadn't noticed that it takes time/resources, thought it would be a very light process).

KyleMaas commented 5 months ago

@mgogoulos Thanks for reviewing this! So, for an example of what we're dealing with, looking at one video with a duration of about 3 hours (this is not an unusual case - typically around one of these per day and it's legitimate user-created video):

So with five MP4 encode profiles enabled (assuming it's working as intended and is only running create_hls once per successful encoding and not multiple times based on multiple chunks ending and triggering it), you're still looking at somewhere in the range of about 5*(24.1+5.3)=147GB of data transfer. If, instead, it's 31 concurrent instances (one Bento + 30 cp) like in #938 then it'd be somewhere in the range of 911GB, and that was already nearing the end of the encoding process, so we're looking at 4GB video files resulting in well over a terabyte of data transfer to/from the disks.

KyleMaas commented 3 months ago

Just want to bring up that there were so many instances of this running concurrently on a recent (very long) video that it OOM'd the Celery instance and had to be manually restarted.