m1k1o / go-transcode

On-demand transcoding origin server for live inputs and static files in Go using ffmpeg. Also with NVIDIA GPU hardware acceleration.
Apache License 2.0
208 stars 38 forks source link

transcode static video files #12

Closed klahaha closed 2 years ago

klahaha commented 2 years ago

inspiration hls-vod-too

edit: for static streams maybe config should support directories? library: /media/video in config can serve all files/directories with /profile/library/folder/video.mkv/index.m3u8 (or other url format from #13)

klahaha commented 2 years ago

basic info (number of streams, duration) : ffprobe -v error -show_format -show_streams -of json

output

``` { "format": { "filename": "something.mkv", "nb_streams": 9, "nb_programs": 0, "format_name": "matroska,webm", "format_long_name": "Matroska / WebM", "start_time": "0.000000", "duration": "6268.596000", "size": "4326785947", "bit_rate": "5521856", "probe_score": 100, "tags": { "encoder": "libebml v1.3.9 + libmatroska v1.5.2", "creation_time": "2019-09-03T18:09:35.000000Z" } } } ```

more info (stream type, language, quality) for audio video subtitles: ffprobe -v error -show_streams -of json"

output

``` { "streams": [ { "index": 0, "codec_name": "h264", "codec_long_name": "H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10", "profile": "Main", "codec_type": "video", "codec_time_base": "1001/48000", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "width": 1920, "height": 1080, "coded_width": 1920, "coded_height": 1088, "closed_captions": 0, "has_b_frames": 2, "sample_aspect_ratio": "1:1", "display_aspect_ratio": "16:9", "pix_fmt": "yuv420p", "level": 40, "color_range": "tv", "color_space": "bt709", "color_transfer": "bt709", "color_primaries": "bt709", "chroma_location": "left", "field_order": "progressive", "refs": 1, "is_avc": "true", "nal_length_size": "4", "r_frame_rate": "24000/1001", "avg_frame_rate": "24000/1001", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "bits_per_raw_sample": "8", "disposition": { "default": 1, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "eng", "BPS-eng": "3599172", "DURATION-eng": "01:44:28.596000000", "NUMBER_OF_FRAMES-eng": "150296", "NUMBER_OF_BYTES-eng": "2820219740", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 1, "codec_name": "ac3", "codec_long_name": "ATSC A/52A (AC-3)", "codec_type": "audio", "codec_time_base": "1/48000", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "sample_fmt": "fltp", "sample_rate": "48000", "channels": 6, "channel_layout": "5.1(side)", "bits_per_sample": 0, "dmix_mode": "-1", "ltrt_cmixlev": "-1.000000", "ltrt_surmixlev": "-1.000000", "loro_cmixlev": "-1.000000", "loro_surmixlev": "-1.000000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "bit_rate": "640000", "disposition": { "default": 1, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "eng", "BPS-eng": "640000", "DURATION-eng": "01:44:28.576000000", "NUMBER_OF_FRAMES-eng": "195893", "NUMBER_OF_BYTES-eng": "501486080", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 2, "codec_name": "ac3", "codec_long_name": "ATSC A/52A (AC-3)", "codec_type": "audio", "codec_time_base": "1/48000", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "sample_fmt": "fltp", "sample_rate": "48000", "channels": 6, "channel_layout": "5.1(side)", "bits_per_sample": 0, "dmix_mode": "-1", "ltrt_cmixlev": "-1.000000", "ltrt_surmixlev": "-1.000000", "loro_cmixlev": "-1.000000", "loro_surmixlev": "-1.000000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "bit_rate": "640000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "fre", "BPS-eng": "640000", "DURATION-eng": "01:44:28.576000000", "NUMBER_OF_FRAMES-eng": "195893", "NUMBER_OF_BYTES-eng": "501486080", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 3, "codec_name": "ac3", "codec_long_name": "ATSC A/52A (AC-3)", "codec_type": "audio", "codec_time_base": "1/48000", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "sample_fmt": "fltp", "sample_rate": "48000", "channels": 6, "channel_layout": "5.1(side)", "bits_per_sample": 0, "dmix_mode": "-1", "ltrt_cmixlev": "-1.000000", "ltrt_surmixlev": "-1.000000", "loro_cmixlev": "-1.000000", "loro_surmixlev": "-1.000000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "bit_rate": "640000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "spa", "BPS-eng": "640000", "DURATION-eng": "01:44:28.576000000", "NUMBER_OF_FRAMES-eng": "195893", "NUMBER_OF_BYTES-eng": "501486080", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 4, "codec_name": "subrip", "codec_long_name": "SubRip subtitle", "codec_type": "subtitle", "codec_time_base": "0/1", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "duration_ts": 6268596, "duration": "6268.596000", "disposition": { "default": 1, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "eng", "BPS-eng": "43", "DURATION-eng": "01:41:20.989000000", "NUMBER_OF_FRAMES-eng": "1142", "NUMBER_OF_BYTES-eng": "33026", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 5, "codec_name": "subrip", "codec_long_name": "SubRip subtitle", "codec_type": "subtitle", "codec_time_base": "0/1", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "duration_ts": 6268596, "duration": "6268.596000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "fre", "BPS-eng": "46", "DURATION-eng": "01:38:05.846000000", "NUMBER_OF_FRAMES-eng": "1009", "NUMBER_OF_BYTES-eng": "33999", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 6, "codec_name": "subrip", "codec_long_name": "SubRip subtitle", "codec_type": "subtitle", "codec_time_base": "0/1", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "duration_ts": 6268596, "duration": "6268.596000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 1, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "fre", "BPS-eng": "0", "DURATION-eng": "01:18:34.168000000", "NUMBER_OF_FRAMES-eng": "18", "NUMBER_OF_BYTES-eng": "465", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 7, "codec_name": "subrip", "codec_long_name": "SubRip subtitle", "codec_type": "subtitle", "codec_time_base": "0/1", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "duration_ts": 6268596, "duration": "6268.596000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 0, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "spa", "BPS-eng": "42", "DURATION-eng": "01:38:06.333000000", "NUMBER_OF_FRAMES-eng": "1006", "NUMBER_OF_BYTES-eng": "30936", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } }, { "index": 8, "codec_name": "subrip", "codec_long_name": "SubRip subtitle", "codec_type": "subtitle", "codec_time_base": "0/1", "codec_tag_string": "[0][0][0][0]", "codec_tag": "0x0000", "r_frame_rate": "0/0", "avg_frame_rate": "0/0", "time_base": "1/1000", "start_pts": 0, "start_time": "0.000000", "duration_ts": 6268596, "duration": "6268.596000", "disposition": { "default": 0, "dub": 0, "original": 0, "comment": 0, "lyrics": 0, "karaoke": 0, "forced": 1, "hearing_impaired": 0, "visual_impaired": 0, "clean_effects": 0, "attached_pic": 0, "timed_thumbnails": 0 }, "tags": { "language": "spa", "BPS-eng": "0", "DURATION-eng": "01:18:34.168000000", "NUMBER_OF_FRAMES-eng": "18", "NUMBER_OF_BYTES-eng": "481", "_STATISTICS_WRITING_APP-eng": "mkvmerge v37.0.0 ('Leave It') 64-bit", "_STATISTICS_WRITING_DATE_UTC-eng": "2019-09-03 18:09:35", "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES" } } ] } ```

get frame segments: ffprobe -v error -skip_frame nokey -show_entries frame=pkt_pts_time -show_entries format=duration -show_entries stream=duration,width,height -select_streams v -of json then build timestamps array then do math to add segment cut if needed, or to make simple we can use video frames timestamps directly

output

``` { "frames": [ { "pkt_pts_time": "0.000000" }, { "pkt_pts_time": "8.759000" }, { "pkt_pts_time": "10.761000" }, { "pkt_pts_time": "17.809000" }, { "pkt_pts_time": "22.981000" } ETC ], "programs": [ ], "streams": [ { "width": 1920, "height": 1080 } ], "format": { "duration": "6268.596000" } } ```

klahaha commented 2 years ago

ffprobe frames list is "slow". not bound by cpu or disk i think took 40-60s for a 4GB 2h video probable HTTP timeout from client. i don't know how complicated to extract frames but if not hard go-transcode should do it not ffprobe

edit: if playlist is in cache it's ok. maybe start probe before requests to warm cache?

m1k1o commented 2 years ago

For the first version, I don't think, that we need to transcode the whole static file. We need to convert it to live HLS stream with offset.

For example, when I want to watch file from the start, I simply start streaming it. When I want to seek, I just start new stream with wanted time offset.

Or do you want to have returned full manifest and process chunks on demand? I think we should investigate multile approaches, and maybe implement them: And having ability to switch them in config file. Because there are multiple usecases (for users with fast seeking, or just not seeking at all).

klahaha commented 2 years ago

When I want to seek, I just start new stream with wanted time offset (option 1) Or do you want to have returned full manifest and process chunks on demand? (option 2)

for option 1 you need custom client no? if manifest is ready option 2,you can use seeking in every hls program like vlc or web

there are multiple usecases (for users with fast seeking, or just not seeking at all)

i don't know what is fast seeking but no seeking i don't think useful except for livestream? what idea do you have?

switch them in config file

yes why not but it's more work

klahaha commented 2 years ago

i added output example for ffprobe.. i dont know if files can have multiple video track maybe when input is HLS stream ?

will livestream and vod code be same? maybe vod will have more features (audio/sub select) but maybe features are also good for livestream if source supports it ?

for library and hls playlist create, maybe we can provide script to warm cache and advice use with cron, and have config.yaml setting to start process for warm cache? plus setting for cache directory

for cache directory i dont know if /tmp is always good answer, it can grow full with ramdisk, and maybe someone wants keep copy of livestream (for library you want to keep playlists, maybe not all segments because its big)

klahaha commented 2 years ago

so problem for transcode all file is skip ahead wont work because server transcoding will be slow. i thinked maybe transcode chunk by chunk helps. but to encode one chunk ffmpeg must probe source again. so transcoding one chunk is slow when we go deep into source (linear slow not exponential). i think if we get byte offset for chunks it's easy to cut video before we call ffmpeg to skip probe, but i dont know how to get byte offsets for video frames. Less good but maybe a setting for chunk count to encode 50 chunks (example) so when we ask for chunk n25 transcode starts for chunks 51-100.

but better is we can have timeout when user quit, so we transcode from starting point (maybe skip ahead maybe start of video) to the end/timeout. best case playback follows transcoding. worst case user skips different times, so 3 or more transcoders run in background and maybe transocde the same chunks several times. but i dont know how to make timeout like this. ffmeg has interactive commands but only for filters i think. we could stop earlier transcoding process when asking for transcode at later start point, but what if user2 is watching video from start and user1 skip ahead (problem)?

m1k1o commented 2 years ago

will livestream and vod code be same? maybe vod will have more features (audio/sub select) but maybe features are also good for livestream if source supports it ?

I would like to keep live as subset of the same codebase.

for library and hls playlist create, maybe we can provide script to warm cache and advice use with cron, and have config.yaml setting to start process for warm cache? plus setting for cache directory

We should be able to cache static files and their playlists (or segments, depeding on user preferences, but most likely no) along with some infomations (ffprobe output) in cache directory. And it should only be optional, user doesn't want to have persistent cache, it should work (even though maybe not that fast).

for cache directory i dont know if /tmp is always good answer, it can grow full with ramdisk, and maybe someone wants keep copy of livestream (for library you want to keep playlists, maybe not all segments because its big)

We should give ability to users to choose own temporary directory in config. But we should differentiate berween cache and temporary transcoding directory. And definitely plus setting for cache directory.

we could stop earlier transcoding process when asking for transcode at later start point, but what if user2 is watching video from start and user1 skip ahead (problem)?

I think mst reliable solution would be to skip only by length of one segment, and in that case same segments from multiple transcoding instances could be reused. But I am not sure how well ffmpeg could do that, how reliable would it be.

klahaha commented 2 years ago

I would like to keep live as subset of the same codebase.

ok so all profile (hls, http, etc) must deal with vod/live? (btw maybe change name for api/http, there is already http, main.go and api/router for http. ideas?)

it should work (even though maybe not that fast)

with risked HTTP timeout from client

But we should differentiate berween cache and temporary transcoding directory

like cache.metadata and cache.transcoding ? maybe more in futur for thumbnails, subtitle extract... use cache.default (or /tmp) when specific cache isnt defined?

I think mst reliable solution would be to skip only by length of one segment, and in that case same segments from multiple transcoding instances could be reused. But I am not sure how well ffmpeg could do that, how reliable would it be.

client-side skip can go anywhere because HLS client will know what segment in playlist for download, but just if playlist is complete it works. on server ffmpeg can do it with -force_key_frames "$keyframes" where $keyframes is keyframe1time,keyframe2time,etc it forces creating keyframes on same time for all qualities, so later we can switch

for me problem is more server side process strategy for transcode, ill run local benchmark and try read jellyfin source how it do it and update here

m1k1o commented 2 years ago

ok so all profile (hls, http, etc) must deal with vod/live? (btw maybe change name for api/http, there is already http, main.go and api/router for http. ideas?)

It should yes, although that implementation is going to take some time. We could guard it with capabilites, what output profile can deal with vod and what not.

Switching to modules could help us with naming convention.

like cache.metadata and cache.transcoding ? maybe more in futur for thumbnails, subtitle extract... use cache.default (or /tmp) when specific cache isnt defined?

Yes, exactly. Multiple types of cache.

on server ffmpeg can do it with -force_key_frames "$keyframes" where $keyframes is keyframe1time,keyframe2time,etc it forces creating keyframes on same time for all qualities, so later we can switch

That sound as a good idea. Maybe we can see PoC and how it pereforms. Hard to predict what would be the best approach here.

klahaha commented 2 years ago

benchmark 20 minute video, transcode 1/10 chunk (0, 10, 20, 30...), time for encoding 1 chunk:

$ cat chunks.log | while read -r line; do echo "${line:2:9}"; done | ~/go/bin/asciigraph -h 5 -c "plot data from stdin"
 35.54 ┼                   ╭──────
 28.93 ┤               ╭───╯
 22.32 ┤           ╭───╯
 15.70 ┤     ╭─────╯
  9.09 ┤╭────╯
  2.48 ┼╯

we see start individual chunk transcode has cost, bigger cost when starting later in file

EDIT: chunks.log was real time used by process (/usr/bin/time -f "%E") h:mm:ss.msms one line for one chunk

klahaha commented 2 years ago

We could guard it with capabilites, what output profile can deal with vod and what not. Switching to modules could help us with naming convention.

good plan

Maybe we can see PoC and how it pereforms.

hls-vod-too do like that i don't know if good strategy but initial test look ok. will test more

m1k1o commented 2 years ago

hls-vod-too do like that

Yes, they can be a good starting point for us.

klahaha commented 2 years ago

we see start individual chunk transcode has cost, bigger cost when starting later in file

good hope im wrong!!!!

$ cat chunks.log | sed 's/0://' | ~/go/bin/asciigraph -h 20
 5.04 ┤╭╮
 4.81 ┤││
 4.59 ┤││
 4.36 ┤││     ╭╮
 4.14 ┤││     ││
 3.91 ┤│╰╮    ││
 3.69 ┤│ │    ││
 3.46 ┤│ │    ││
 3.24 ┤│ │    ││    ╭╮
 3.01 ┤│ │    ││    ││
 2.79 ┤│ │    ││    ││               ╭╮  ╭╮
 2.56 ┤│ ╰╮   ││    ││               ││  ││
 2.33 ┼╯  │   ││    ││  ╭╮╭╮         ││  ││
 2.11 ┤   │  ╭╯│    ││  │││╰╮        ││ ╭╯│
 1.88 ┤   │  │ │  ╭╮│╰╮ │││ │     ╭╮ ││ │ │
 1.66 ┤   │  │ │  │││ │ │││ │     │╰╮│╰╮│ │
 1.43 ┤   │  │ │  │╰╯ ╰─╯╰╯ │ ╭╮╭╮│ ││ ╰╯ │       ╭╮
 1.21 ┤   ╰─╮│ │  │         │ │╰╯││ ╰╯    │ ╭──╮  ││
 0.98 ┤     ╰╯ ╰╮╭╯         │╭╯  ││       │ │  ╰╮ │╰
 0.76 ┤         ╰╯          ╰╯   ││       ╰╮│   │╭╯
 0.53 ┤                          ╰╯        ╰╯   ╰╯

1/10 chunks (0, 10, etc, 450) time for one chunk (seconds)

The -ss parameter needs to be specified somewhere before -i: or there is slow seek. i put -ss before -i and the results look good like you see in graph

chunk-by-chunk is realistic!

so go-transcode just need to transcode few chunks before request (n+5-10 for request chunk n)