mdhiggins / sickbeard_mp4_automator

Automatically convert video files to a standardized format with metadata tagging to create a beautiful and uniform media library
MIT License
1.53k stars 202 forks source link

add `keep-dispositions` options to audio and subtitles #1609

Closed sdfg2 closed 2 years ago

sdfg2 commented 2 years ago

Is your feature request related to a problem? Please describe. Foreign language films that don't have eng audio streams get stripped of all audio.

Describe the solution you'd like An option to always keep the default disposition. The ideal way would be a keep-dispositions option analogous to the ignore-dispositions option.

mdhiggins commented 2 years ago

allow-language-relax is probably what you want

sdfg2 commented 2 years ago

Not really, no. If I've got a file with a few different dubs, I don't want all of them. I just want the default one, which should hopefully be the native one for the video.

sdfg2 commented 2 years ago

Solved with 501fcf3

sdfg2 commented 2 years ago

Actually, it's only partly complete, I spoke to soon.

If I have set languages = eng, the file contains audio jpn +default and eng -default. jpn will get deleted. I'd like to keep it because it's +default.

I know you're busy for the next few days. The more I think about this, the more I'm thinking it might just be easier to let sma do downmixing, not touching anything else, and then run a post process script to fetch subtitles, delete unwanted audio/subs etc, but then sma is so so close to being able to do it all. I haven't programmed in over a decade, I'm an algorithm/logic/design type person now, or I would try and help.

mdhiggins commented 2 years ago

c9b6aca8ff6ccb30ee429b066720b5b23af5ca74

Put that together real quick, see if that works

Added a force-default option for audio and subtitle that will bypass the usual language/disposition/unique checks if a stream is marked as default

sdfg2 commented 2 years ago

Sorry, ended up being busier than I thought the past few days myself!

sdfg@heracles ~/scratch $ sma -i Ghost\ in\ the\ Shell\ 2.0\ \(2008\)\ \[imdb-tt1260502\]\[Bluray-1080p\]\[DTS-ES\ 6.1\]\[x264\]-MOOVEE.mkv 
Manual processor started.
Python 64-bit 3.10.6 (main, Aug  3 2022, 17:39:45) [GCC 12.1.1 20220730].
Guessit version: 3.4.3.
/usr/bin/python3
Loading config file /home/askesis/sickbeard_mp4_automator/config/autoProcess.ini.
Processing file Ghost in the Shell 2.0 (2008) [imdb-tt1260502][Bluray-1080p][DTS-ES 6.1][x264]-MOOVEE.mkv
Input Data
{
    "format": "matroska,webm",
    "format-fullname": "Matroska / WebM",
    "video": {
        "index": 0,
        "codec": "h264",
        "bitrate": 9756363,
        "pix_fmt": "yuv420p",
        "profile": "high",
        "fps": 23.976023976023978,
        "framedata": {
            "pix_fmt": "yuv420p",
            "side_data_list": [
                {
                    "side_data_type": "H.26[45] User Data Unregistered SEI message"
                }
            ]
        },
        "dimensions": "1920x1080",
        "level": 4.1,
        "field_order": "progressive"
    },
    "audio": [
        {
            "index": 1,
            "codec": "dts",
            "bitrate": 768000,
            "channels": 7,
            "samplerate": 48000,
            "language": "jpn",
            "disposition": "+default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions"
        },
        {
            "index": 2,
            "codec": "dts",
            "bitrate": 768000,
            "channels": 7,
            "samplerate": 48000,
            "language": "eng",
            "disposition": "-default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions"
        }
    ],
    "subtitle": [
        {
            "index": 3,
            "codec": "subrip",
            "disposition": "+default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions",
            "language": "eng"
        }
    ],
    "attachment": []
}
Reading video stream.
Video codec detected: h264.
Pix Fmt: yuv420p.
Profile: high.
Video codec parameters None.
Creating hevc_nvenc video stream from source stream 0.
Reading audio streams.
The following stream indexes have been identified as being copies: [] [stream-codec-combinations].
Audio detected for stream 1 - dts jpn 7 channel.
Unable to generate options, unexpected exception occurred.
Traceback (most recent call last):
  File "/home/askesis/sickbeard_mp4_automator/resources/mediaprocessor.py", line 123, in process
    options, preopts, postopts, ripsubopts, downloaded_subs = self.generateOptions(inputfile, info=info, original=original, tagdata=tagdata)
  File "/home/askesis/sickbeard_mp4_automator/resources/mediaprocessor.py", line 858, in generateOptions
    self.log.debug("Audio stream %s is flagged as default, forcing inclusion [Audio.force-default]." % (s.index))
UnboundLocalError: local variable 's' referenced before assignment
There was an error processing file Ghost in the Shell 2.0 (2008) [imdb-tt1260502][Bluray-1080p][DTS-ES 6.1][x264]-MOOVEE.mkv, no output data received
mdhiggins commented 2 years ago

e9f15183f89b90f083226f03c5cf953da52638d4

That should fix that

mdhiggins commented 2 years ago

Ok I didn't forget about you

https://github.com/mdhiggins/sickbeard_mp4_automator/tree/ffsubsync

This branch is the one I'm working on, still testing the ffsubsync stuff but I also included a force option, manually filtering subliminal results for forced subtitles if the option is enabled

sdfg2 commented 2 years ago

Oh I don't expect anything instantly, please don't think that! I very much appreciate any effort you put into this. I've also been busy, won't be able to look at anything until about Friday now.

I've been finding that a number of my media files have multiple different language tracks all labelled +default, which is infuriating. I'm actually playing with the idea of fetching the 'original language' data from TMDB API instead of relying only on what is in the file itself. I also realise it's probably getting way out of scope for this project XD

mdhiggins commented 2 years ago

Hm that's actually probably easily doable, I'll look into things

Either way the above branch at least gives you the option for forced subtitles through subliminal so give that a try

mdhiggins commented 2 years ago

https://github.com/mdhiggins/sickbeard_mp4_automator/commit/6112f2a4d88cf2b9e07c015777c53e2bdafa94a7

How's that

mdhiggins commented 2 years ago

Just checking in again to see if you got a chance to test these changes

mdhiggins commented 2 years ago

302fa7ec388036b91a3cae1688a867e2e76aea36

Tweak the forced option so you can download both forced and standard

sdfg2 commented 2 years ago

Sorry, I was on holiday! Just catching up now.

First attempt:

sdfg@heracles ~/scratch $ sma -i Ghost\ in\ the\ Shell\ 2.0\ \(2008\)\ \[imdb-tt1260502\]\[Bluray-1080p\]\[DTS-ES\ 6.1\]\[x264\]-MOOVEE.mkv 
Manual processor started.
Python 64-bit 3.10.6 (main, Aug  3 2022, 17:39:45) [GCC 12.1.1 20220730].
Guessit version: 3.4.3.
/usr/bin/python3
Loading config file /home/sdfg/sickbeard_mp4_automator/config/autoProcess.ini.
Processing file Ghost in the Shell 2.0 (2008) [imdb-tt1260502][Bluray-1080p][DTS-ES 6.1][x264]-MOOVEE.mkv
Unable to generate options, unexpected exception occurred.
Traceback (most recent call last):
  File "/home/sdfg/sickbeard_mp4_automator/resources/mediaprocessor.py", line 134, in process
    options, preopts, postopts, ripsubopts, downloaded_subs = self.generateOptions(inputfile, info=info, original=original, tagdata=tagdata)
  File "/home/sdfg/sickbeard_mp4_automator/resources/mediaprocessor.py", line 662, in generateOptions
    awl, swl = self.safeLanguage(info, tagdata.tmdbid, tagdata.mediatype)
AttributeError: 'NoneType' object has no attribute 'tmdbid'
There was an error processing file Ghost in the Shell 2.0 (2008) [imdb-tt1260502][Bluray-1080p][DTS-ES 6.1][x264]-MOOVEE.mkv, no output data received
✓ 0 [843ms]

Figured out it was because tag = False, but might need a catch for that. Continuing tests...

EDIT: I realise why, looking at the code. Need a way to fetch original-language from TMDB without tagging the file with the rest of the metadata.

sdfg2 commented 2 years ago

How do I enable debug mode? I see a lot of self.log.debug in the code but don't see how to enable it to check stuff.

I've spent a lot of time adding the new config options and trying to figure out the combinations to use, and I have a few suggestions.

Separate out the languages from the technical codec stuff. Streamline the language/disposition options.

[Audio.Tracks]
languages = original,jpn,fre,ita,default,nor
dispositions = default,dub,comment
maximum-audio-tracks-per-language = 1
maximum-audio-tracks-total = 3
at-least-one-audio-track = True

languages and dispositions are ordered. Then you can do a loop like this (it's been a very long time since I've done any pseudocode, please be kind!)

for each language in languages
    for each disposition in dispositions
        if exists language.disposition
            add language.disposition to track list
            current-language-track-count++
            current-total-track-count++
        if current-language-track-count == maximum-audio-tracks-per-language
            break
    if current-total-track-count == maximum-audio-tracks-total
        break
if current-total-track-count == 0 && at-least-one-audio-track
    add first language.disposition in original file

By my reckoning you could then get rid of these:

[Audio] include-original-language - covered by languages first-stream-of-language - covered by maximum-tracks-per-language allow-language-relax - covered by languages and at-least-one-track relax-to-default - covered by languages ignored-dispositions - covered by dispositions force-default - covered by languages unique-dispositions - covered

Similar kind of thing for subtitles - technical stuff (burning, embedding, codec etc) in the main bit, then:

[Subtitle.Tracks]
languages = original,jpn,fre,ita,default,nor
dispositions = forced,default,hearing_impaired,comment
maximum-subtitle-tracks-per-language = 2
maximum-subtitle-tracks-total = 2
ignore-embedded-subs = False

Use a similar loop as above (except including a search in subliminal for each combination as well)

for each language in languages
    for each disposition in dispositions
        if exists language.disposition && !ignored-embedded-subs
            add language.disposition to track list
            current-language-track-count++
            current-total-track-count++
        else
            subliminal search for language.disposition
            if downloaded language.disposition
                add language.disposition to track list
                current-language-track-count++
                current-total-track-count++
        if current-language-track-count == maximum-subtitle-tracks-per-language
            break
    if current-total-track-count == maximum-subtitle-tracks-total
        break

Then you can get rid of

[Subtitle]
default-language
include-original-language
first-stream-of-language
ignored-dispositions
force-default
unique-dispositions

[Subtitle.Subliminal]
download-forced-subs
include-hearing-impared-subs

I just think there are a lot of true/false options that can be difficult to follow when in combination with each other (forced,default,relax etc). With these changes a user knows at a glance what the priorities and limits are without having to create a semantic tree to figure it out!

mdhiggins commented 2 years ago

https://github.com/mdhiggins/sickbeard_mp4_automator/commit/8fb1022d5945c7f0ec26d083737d5669d0ed9f75 https://github.com/mdhiggins/sickbeard_mp4_automator/commit/54046aa3702271da3fed0268d48c506872dce6b1 https://github.com/mdhiggins/sickbeard_mp4_automator/commit/2046664da5c48371ba52e84f6d0d6f5ed77a62c2

Fixes the error from your last post

Reworking all the audio and subtitle options would be a big undertaking, challenging to maintain backwards compatibility, and I think personally a disposition whitelist approach is not a great one.

Disposition and language data is very often lacking and unreliable, inconsistently implemented across different containers (quick example is that mp4 containers don't even store a 'forced' flag), and not really consistent with what I've found most users over the years are looking to do with this automation step. Lots of media will have no positive disposition flags or will just inappropriately flag all dispositions as default. From my experience most users are looking to preserve what is there and only explicitly eliminate what they know they don't want when taking an approach to media automation. I do think I can probably eliminate some of the options added by your recent feature requests (relax to default being the first one to drop) to make things clearer but I probably would not look to rewrite the whole settings approach unless there was a compelling reason.

Debug logging is covered in the wiki

sdfg2 commented 2 years ago

Apologies about the debugging, I could have sworn I checked the wiki. My bad!

Sure, if the compatibility and work involved is too much I completely get that. But garbage in, garbage out will happen no matter the processing method. If anything, that's at the crux of what I've been asking for (without realising it) - external validation of what was originally intended (tmdb language lookup, forced subs). I was just trying to offer a more general, agnostic approach that can handle what is and isn't there in any given source media.

There's one thing I can't seem to figure out if it's possible or not, and that's to only fetch full subtitles if there is not a matching language audio stream. i.e I've filtered all the audio to just the original audio as tmdb provides, it isn't English, in that case I want full English subtitles. But if the original audio is English, then I only want to check for forced subtitles.

Edit: Bazarr has this option, called "Exclude Audio" (terrible name for it).

mdhiggins commented 2 years ago

https://github.com/mdhiggins/sickbeard_mp4_automator/commit/d3f8b7a01a4e28f84f4d09f732bccd21732b908e

Take a look at that

Removed some legacy options which I felt weren't needed

Added a new dynamic-download option which will set subtitle downloading preferences based on original language when compared against your set default language

Also included 'original-language' as a valid parameter for the sorting function

sdfg2 commented 2 years ago

I think I mis-spoke when I said 'fetch'. I didn't specifically mean 'download' but 'obtain', whether or not that is from the existing file or from an external source. My test media (eng audio) already has eng-forced and eng subtitles embedded. Both get added to the resulting file when I only want the forced ones.

mdhiggins commented 2 years ago

Killing me

I'm going to say this needs to move to the custom functions then too niche of a request

I went ahead and included the tagdata object as a parameter in the skipStream and validation custom methods (though tagdata will not always be present on the validation call depending on what script is calling it) so that you can have easy access to the original language

mdhiggins commented 2 years ago

https://github.com/mdhiggins/sickbeard_mp4_automator/commit/cd1b6fc4c7130d84c9d6ce8f2e27f7ff08c5dc56

sdfg2 commented 2 years ago

Ah, I thought there was a flow what subtitles do I need -> what subtitles do I have -> what subtitles do I need to download. I was just suggesting moving the dynamic-download logic from what subtitles do I need to download to what subtitles do I need.

But sure, having the tags there is super useful. It should be trivial to add the logic necessary to an external script now. Did you just remove dynamic-download entirely, or am I misreading the diffs?

mdhiggins commented 2 years ago

Yeah just nuked it entirely. It was half baked anyway. Similar functionality should be doable via custom functions

sdfg2 commented 2 years ago

Sure, just more inefficiently. Now I'm going to have to ensure that full subtitles are always present, and then do a removal pass if they don't match. At least with dynamic-download sma wouldn't download extra subs it definitely knew I didn't want.

Before: file (english audio) -> known 'native' audio track -> post process to remove full subs

Now file (english audio) -> unknown audio track -> get full subs -> post process to remove full subs

mdhiggins commented 2 years ago

You can change settings on the fly exactly the same way the proposed dynamic download feature was implemented in your custom function

mdhiggins commented 2 years ago

The only thing it was doing was setting which type of subtitle to download

            self.settings.downloadforcedsubs = (self.settings.adl == original_language)
            self.settings.downloadsubs = (self.settings.adl != original_language)

which from any of the custom functions can be set after checking if tagdata is available

            mp.settings.downloadforcedsubs = (mp.settings.adl == tagdata.original_language)
            mp.settings.downloadsubs = (mp.settings.adl != tagdata.original_language)

Plus you can sweep the info object and see if you want to disable downloading entirely because it has embedded subs that fit your need

sdfg2 commented 2 years ago

Yeah, I misunderstood how you were doing custom functions until I started going through the wiki and the examples - I thought you were just passing environment variables to external scripts. I was expecting my script to get the file name or other environmental data, then for me to ffprobe, parse that data, and then ffmpeg myself to remove unnecessary subtitles.

I've got very little experience with python, and that was ten years ago, so I haven't a clue how to write a custom function for this. My best option is to handle it in the external script I need to write, given I have to pass the resulting file to another program afterwards anyway.

sdfg2 commented 2 years ago

Thanks for your help and patience. I've got it working just how I want with 6 lines of bash :-)

mdhiggins commented 2 years ago

Should share it in case others have the same issue

I threw this together as a quick pass to get you started if you ever wanted a more integrated solution

def skipStream(mp, stream, info, path, tagdata):
    mp.log.info("Initiating custom stream skip check method.")

    if tagdata:
        foreign_language = tagdata.original_language != mp.settings.adl and any(a for a in info.audio if a.metadata.get('language') == tagdata.original_language and mp.validDisposition(a, mp.settings.ignored_audio_dispositions))
        mp.settings.downloadsubs = foreign_language and not any(s for s in info.subtitle if not s.disposition.get('forced') and not s.disposition.get('comment') and mp.validDisposition(s, mp.settings.ignored_subtitle_dispositions))
        mp.settings.downloadforcedsubs = not foreign_language and not any(s for s in info.subtitle if s.disposition.get('forced') and mp.validDisposition(s, mp.settings.ignored_subtitle_dispositions))
        if foreign_language and stream.type == "subtitle":
            return stream.disposition.get("forced")
        elif not foreign_language and stream.type == "subtitle":
            return not stream.disposition.get("forced")

    return False
sdfg2 commented 2 years ago

Should share it in case others have the same issue

Yeah, I just wanted to do more testing on it. There are probably edge cases where it won't work.

For anyone reading this: DO NOT USE IT. It's for guidance to write your own only, and has hard coded preferences for me. NEVER USE RANDOM SCRIPTS YOU FIND ON THE INTERNET UNLESS YOU UNDERSTAND THEM.

BASETMPDIR="/store/.transcode/video"

if [ "$radarr_eventtype" = "Test" ] || [ "$sonarr_eventtype" = "Test" ]; then
    exit 0
elif [ -n "$radarr_moviefile_path" ]; then
    INPUTFILE=$radarr_moviefile_path
    ID="-tmdb $radarr_movie_tmdbid "
elif [ -n "$sonarr_episodefile_path" ]; then
    INPUTFILE=$sonarr_episodefile_path
    ID="-tvdb $sonarr_series_tvdbid "
elif [ -n "$1" ]; then
    if [ "${1::1}" != "/" ]; then
        INPUTFILE="$(pwd)/$1"
    else
        INPUTFILE=$1
    fi
fi

mkdir -p "$BASETMPDIR"
fullfile=$(basename "${INPUTFILE}")
filename=$(basename "${INPUTFILE%.*}")
filetype=${INPUTFILE##*.}
tmpdir="$BASETMPDIR/$filename"
mkdir -p "$tmpdir"
cp "$INPUTFILE" "$tmpdir"

infile="$tmpdir/$fullfile"
smafile="$tmpdir/$filename.mkv"
subsfile="$tmpdir/$filename.mkv.subs"
normfile="$tmpdir/$filename.mkv.norm"

# sma
if ([ "$filetype" = "mkv" ] && [ ! -f "$smafile.original" ]) || ([ "$filetype" != "mkv" ] && [ ! -f "$smafile" ]); then
        /store/.bin/sma/manual.py "$ID"-a -i "$infile" || rm -f "$smafile" "$infile" "$smafile.original"
fi

# post-process sma
if [ ! -f "$subsfile" ]; then
    eval $(ffprobe -v 0 -show_entries stream=index:stream_tags=language,title -select_streams a -of flat=s=_ "$smafile")

    audio_lang=$streams_stream_0_tags_language

    eval $(ffprobe -v 0 -show_entries stream=index:stream_tags=language,title -select_streams s:0 -of flat=s=_ "$smafile")

    if [ "$audio_lang" = "eng" ]; then
        if [ "$streams_stream_0_tags_title" = "Forced" ]; then
            ffmpeg -v 0 -err_detect ignore_err -fflags +igndts -f matroska -i "$smafile" -c:v copy -c:a copy -c:s copy -map 0:v:0 -map 0:a:0 -map 0:s:0 -f matroska "$subsfile" || rm -f "$subsfile"
        else
            ffmpeg -v 0 -err_detect ignore_err -fflags +igndts -f matroska -i "$smafile" -c:v copy -c:a copy -c:s copy -map 0:v:0 -map 0:a:0 -f matroska "$subsfile" || rm -f "$subsfile"
        fi
    else
        ffmpeg -v 0 -err_detect ignore_err -fflags +igndts -f matroska -i "$smafile" -c:v copy -c:a copy -c:s copy -map 0:v:0 -map 0:a:0 -map 0:s:0 -f matroska "$subsfile" || rm -f "$subsfile"
    fi
fi

# ffmpeg-normalize
if [ ! -f "$normfile" ]; then
    ffmpeg-normalize "$subsfile" -c:a ac3 -pr -nt rms -t -23 -f -of "$tmpdir" -ofmt matroska -ext norm || rm -f "$normfile"
fi

mv -f "$normfile" "$INPUTFILE" || exit 1
rm -r "$tmpdir"
sdfg2 commented 2 years ago

302fa7e

Tweak the forced option so you can download both forced and standard

Now I've got something (vaguely) production ready, I've been testing it more thoroughly on edge cases. I've noticed that the forced download doesn't seem to work. I'm not sure if you'd rather open a new issue for that to keep this clear.

I've attached a very cut down file that I've been using to test. (Just remove .csv, seems github doesn't like mkv). Here is one of a couple of 'foreign parts only' (forced) subtitles that are available. sma doesn't find any forced subtitles.

War of the Worlds (2019) - S01E01 - Episode 1.cut.mkv.csv