Closed Rouzax closed 4 years ago
ISO 639-2 defines two different codes for the Dutch language. The formal code is "nld" And "dut" is just a synoniem which is not recognized.
I understand but all embedded subs use the dut
format to my knowledge
To be precise, these are not embedded, a embedded subtile is when the sub in embedded within the container so within the . mkv of .mp4.
Sorry for the confusion, in my case they are the embedded subs but I extract and process them with Subtitle Edit before Medusa imports the episode. So they end up as external but the naming comes from the embedded subs.
Hope I make sense 😀
No, embedded subs are only a stream. The name is created by the software you use to extract it. Why do you extract it? That seems useless to me.
Reason for doing so is that I parse them with https://github.com/SubtitleEdit/subtitleedit to remove all Hearing Impaired entries
So if I understand correctly not extracting the subs is the goal but removing the Hearing Impaired subs from the source is the goal? Then removing them should be sufficient without subtracting all subs, or am I missing something?
And the standard for subtitle filename extensions is not using the tree letter Iso code (e.g. .eng and .dut) but the tow letter Iso code (e.g. .nl and .en) I don't know subtitle edit but maybe you can change to the two letter code.
If not I can make a small python script for you to extract the subtitles with a two code as extension and deleting the one with hearing impaired.
My flow is as follows.
My goal is to have Medusa recognize the Dutch subs when importing.
So if you have a python script that will convert 3-letter country codes to 2-letter country codes or have Medusa understand dut
I'm a happy camper 😄
Not only Medusa does not support the 3 letter code, also all media players expect the two letter code as extension of subtitles files.
Still not sure why you don't keep the subs in the .mkv I can make you a script dat does this:
Input: Video file with all kinds of subs in there Output: Video file with only Dutch and English subs in there (without hearing Impaired subs)
or:
Input: Video file with all kinds of subs in there Output: Dutch and English subtitle files with two letter extension and skipping the hearing Impaired subs.
or: Input: Video file with all kinds of subs in there Output: Dutch and English subtitle files with two letter extension and skipping the hearing Impaired subs and a new Video File without subtitles.
or: A script that renames subtitles files with 3 lettercode to subtitles files with two lettercode.
The last is the simplest option to make, you can even use a .bat script to do that. something like:
rename *dut.srt *nl.srt
rename *eng.srt *en.srt
Medusa is nog going to support that. We use libs that parse the language code. So we would need to make exceptions in python libs and js libs?
Batch doesn't like that 😉, I've tried
Will end up with
test.dut.nl.srt
I extract them because I want to edit the subs and strip out unwanted HI and other things like song lyrics etc
You must not use . in front of the dut. The point (.) is greedy.
So this is not working.
rename *.dut.srt *.nl.srt
but this works
rename *dut.srt *nl.srt
You must not use . in front of the dut. The point (.) is greedy.
So this is not working.
rename *.dut.srt *.nl.srt
but this works
rename *dut.srt *nl.srt
For me with Windows 10 rename *dut.srt *nl.srt
will give me
test.dut.srtnl.srt
OK try this command:
rename ???????????????????????????????????????????????.dut.srt ???????????????????????????????????????????????.nl.srt
Make sure to use enough question marks (?) to catch even the longest name. Too much question mark does not matter too little you miss files with longer names.
This is some high-tech scripting and it works 😃
Not high tech just avoiding some Microsoft stupid rename wildcard implementations.
To comment on
Not only Medusa does not support the 3 letter code, also all media players expect the two letter code as extension of subtitles files.
Kodi works perfectly with the .dut.srt
files but I'll add in your rename command to fix it.
Thanks for your help.
@BenjV It did work in my test but it fails in production, it looks to be because there are multiple dots
It appears that Batch is very picky https://superuser.com/questions/475874/how-does-the-windows-rename-command-interpret-wildcards
C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES>dir
Volume in drive C has no label.
Volume Serial Number is D88D-6860
Directory of C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES
09-10-2020 09:30 <DIR> .
09-10-2020 09:30 <DIR> ..
09-10-2020 09:30 <DIR> Sample
09-10-2020 09:30 70.862 the.boys.s02e08.1080p.web.h264-cakes.#4.eng.srt
09-10-2020 09:30 61.729 the.boys.s02e08.1080p.web.h264-cakes.dut.srt
09-10-2020 09:30 70.862 the.boys.s02e08.1080p.web.h264-cakes.eng.srt
09-10-2020 07:12 4.325.977.430 the.boys.s02e08.1080p.web.h264-cakes.mkv
09-10-2020 07:12 254 the.boys.s02e08.1080p.web.h264-cakes.nfo
09-10-2020 07:12 2.922 the.boys.s02e08.1080p.web.h264-cakes.srr
6 File(s) 4.326.184.059 bytes
3 Dir(s) 4.795.503.951.872 bytes free
C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES>ren ???????????????????????????????????????????????.dut.srt ???????????????????????????????????????????????.nl.srt
The system cannot find the file specified.
C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES>dir
Volume in drive C has no label.
Volume Serial Number is D88D-6860
Directory of C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES
09-10-2020 09:30 <DIR> .
09-10-2020 09:30 <DIR> ..
09-10-2020 09:30 <DIR> Sample
09-10-2020 09:30 70.862 the.boys.s02e08.1080p.web.h264-cakes.#4.eng.srt
09-10-2020 09:30 61.729 the.boys.s02e08.1080p.web.h264-cakes.dut.srt
09-10-2020 09:30 70.862 the.boys.s02e08.1080p.web.h264-cakes.eng.srt
09-10-2020 07:12 4.325.977.430 the.boys.s02e08.1080p.web.h264-cakes.mkv
09-10-2020 07:12 254 the.boys.s02e08.1080p.web.h264-cakes.nfo
09-10-2020 07:12 2.922 the.boys.s02e08.1080p.web.h264-cakes.srr
6 File(s) 4.326.184.059 bytes
3 Dir(s) 4.795.627.683.840 bytes free
C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES>
OK, I can make a small python script that does the renaming for you. How do you want it to function?
Thank you very much for that offer but I figured it out by using Bulk Rename CLI https://www.bulkrenameutility.co.uk/Download.php#DownloadBulkRenameCommand
C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES>%brc64% /DIR:"C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES" /PATTERN:"*.srt" /REPLACECI:.dut:.nl /REPLACECI:.eng:.en
Processing Folder C:\TEMP\Torrent\PROCD\TV\The.Boys.S02E08.1080p.WEB.H264-CAKES\
Filename the.boys.s02e08.1080p.web.h264-cakes.#4.eng.srt would be renamed to the.boys.s02e08.1080p.web.h264-cakes.#4.en.srt
Filename the.boys.s02e08.1080p.web.h264-cakes.dut.srt would be renamed to the.boys.s02e08.1080p.web.h264-cakes.nl.srt
Filename the.boys.s02e08.1080p.web.h264-cakes.eng.srt would be renamed to the.boys.s02e08.1080p.web.h264-cakes.en.srt
Ok, glad to be of a little assistance.
Really appreciate the offer!
@BenjV since you offered, would you be willing to take a look at https://github.com/jobrien2001/mkvstrip ? The python script uses mkvmerge to remove unwanted subtitle and audio languages that might be part of the mkv but the script is crashing more and more and the original author does not respond.
It seems to be related to character encoding in the subtitle names (I think) Here 2 json outputs of files that are crashing or not working. 1.txt 2.txt
And errors? Or trace back?
On the 1.json
it just does nothing even with debug on in the script it will just stop.
Some of the errors I managed to "fix" by changing line 223 to
process = subprocess.Popen(command, stdout=subprocess.PIPE, universal_newlines=True, encoding="utf8", errors='ignore')
For the second mkv
C:\TEMP\Torrent\PROCD>"C:\Python37\python.exe" %mkvstrip% -b %MKVMergeLocation% -v -l eng,dut -s eng,dut -r Forced C:\TEMP\Torrent\PROCD\TV\1
Searching for MKV files to process.
Warning: This may take some time...
Checking C:\TEMP\Torrent\PROCD\TV\1\Tehran.S01E01.Emergency.Landing.in.Tehran.1080p.ATVP.WEB-DL.DDP5.1.H.264-NTb.mkv
C:\TEMP\Torrent\PROCD>
Did a trace with python (first time for everything 😄 ) and it seems on the first file all subtitle languages are not recognised
--- modulename: mkvstrip, funcname: __init__
mkvstrip.py(204): self.lang = track_data["properties"].get("language", "und")
mkvstrip.py(205): self.codec = track_data["codec"]
mkvstrip.py(206): self.type = track_data["type"]
mkvstrip.py(207): self.id = track_data["id"]
mkvstrip.py(208): self.name = track_data["properties"].get("track_name")
mkvstrip.py(209): self.forced = track_data["properties"].get("forced_track")
mkvstrip.py(243): track_map[track_obj.type].append(track_obj)
mkvstrip.py(241): for track_data in json_data["tracks"]:
mkvstrip.py(242): track_obj = Track(track_data)
I can write a python script for you that uses ffmpeg to extract the subtitles from the video. Not that complicated at all.
Something like: Input: Videofile Output: Videofile without subs and Dutch + English sub files And of course skipping the Hearing Impaired subs.
Or I could mux those subs also into the output video.
What I want is to remove all embedded audio streams and subtitles that do not match the language I set (for me EN and NL) Keeping the HI since some shows will only have the full English subtitle in the HI track since the normal English track might only be the Spanish-speaking parts, Narco for instance.
I strip out the HI and other crap with SubtitleEdit, so I always end up with a clean English and Dutch subtitle
Input: Videofile Embedded Audio: eng, dut, ger Embedded Subs: eng, dut, ger
Output: Videofile Embedded Audio: eng, dut Embedded Subs: (eng, dut) or (none) Extracted SRT: eng, dut
Ok, I can do that. Do you want to keep the original input file for example renamed with and .old extension or shall I delete it?
Delete it
I think I know why the python script does nothing on the Tehran episode. The only Audio Track is Hebrew so it will skip removing the subs.
C:\TEMP\Torrent\PROCD>"C:\Python37\python.exe" %mkvstrip% -b %MKVMergeLocation% -v -l eng,dut -s eng,dut -r Forced -t C:\TEMP\Torrent\PROCD\TV\1
Searching for MKV files to process.
Warning: This may take some time...
Checking C:\TEMP\Torrent\PROCD\TV\1\Tehran.S01E01.Emergency.Landing.in.Tehran.1080p.ATVP.WEB-DL.DDP5.1.H.264-NTb.mkv
REMOVE: Track #1: heb - E-AC-3 - Name:None - Forced:False
REMOVE: Track #2: heb - SubRip/SRT - Name:Forced - Forced:True
REMOVE: Track #3: ara - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #4: bul - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #5: chi - SubRip/SRT - Name:Simplified Mandarin - Forced:False
REMOVE: Track #6: chi - SubRip/SRT - Name:Traditional Mandarin - Forced:False
REMOVE: Track #7: cze - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #8: dan - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #9: ger - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #10: gre - SubRip/SRT - Name:None - Forced:False
KEEP: Track #11: eng - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #12: spa - SubRip/SRT - Name:Latin America - Forced:False
REMOVE: Track #13: spa - SubRip/SRT - Name:Spain - Forced:False
REMOVE: Track #14: est - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #15: fin - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #16: fre - SubRip/SRT - Name:Canada - Forced:False
REMOVE: Track #17: fre - SubRip/SRT - Name:France - Forced:False
REMOVE: Track #18: heb - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #19: heb - SubRip/SRT - Name:SDH - Forced:False
REMOVE: Track #20: hin - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #21: hun - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #22: ind - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #23: ita - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #24: jpn - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #25: kor - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #26: lit - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #27: lav - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #28: may - SubRip/SRT - Name:None - Forced:False
KEEP: Track #29: dut - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #30: nor - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #31: pol - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #32: por - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #33: por - SubRip/SRT - Name:Brazil - Forced:False
REMOVE: Track #34: rus - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #35: slo - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #36: slv - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #37: swe - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #38: tam - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #39: tel - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #40: tha - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #41: tur - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #42: ukr - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #43: vie - SubRip/SRT - Name:None - Forced:False
REMOVE: Track #44: chi - SubRip/SRT - Name:Cantonese - Forced:False
This is an example where the Forced EN subtitles are only for the non English parts and I actually need the SDH ones https://partnerhelp.netflixstudios.com/hc/en-us/articles/224198488-What-is-a-Forced-Narrative-Subtitle-
Remuxing: The.Boys.S02E08.What.I.Know.2160p.AMZN.WEBRip.DDP5.1.x265-NTb.mkv
Title: None
============================
Retaining subtitle track(s):
Track #3: eng - SubRip/SRT - Name:SDH - Forced:False
Track #20: dut - SubRip/SRT - Name:None - Forced:False
Removing subtitle track(s):
Track #2: eng - SubRip/SRT - Name:Forced - Forced:True
Track #4: ara - SubRip/SRT - Name:None - Forced:False
Track #5: dan - SubRip/SRT - Name:None - Forced:False
Track #6: ger - SubRip/SRT - Name:None - Forced:False
Track #7: spa - SubRip/SRT - Name:Latinoamérica - Forced:False
Track #8: spa - SubRip/SRT - Name:España - Forced:False
Track #9: fin - SubRip/SRT - Name:None - Forced:False
Track #10: fil - SubRip/SRT - Name:None - Forced:False
Track #11: fre - SubRip/SRT - Name:None - Forced:False
Track #12: heb - SubRip/SRT - Name:None - Forced:False
Track #13: hin - SubRip/SRT - Name:None - Forced:False
Track #14: ind - SubRip/SRT - Name:None - Forced:False
Track #15: ita - SubRip/SRT - Name:None - Forced:False
Track #16: jpn - SubRip/SRT - Name:None - Forced:False
Track #17: kor - SubRip/SRT - Name:None - Forced:False
Track #18: may - SubRip/SRT - Name:None - Forced:False
Track #19: nor - SubRip/SRT - Name:Norsk Bokmål - Forced:False
Track #21: pol - SubRip/SRT - Name:None - Forced:False
Track #22: por - SubRip/SRT - Name:Brasil - Forced:False
Track #23: por - SubRip/SRT - Name:Portugal - Forced:False
Track #24: rus - SubRip/SRT - Name:None - Forced:False
Track #25: swe - SubRip/SRT - Name:None - Forced:False
Track #26: tam - SubRip/SRT - Name:None - Forced:False
Track #27: tel - SubRip/SRT - Name:None - Forced:False
Track #28: tha - SubRip/SRT - Name:None - Forced:False
Track #29: tur - SubRip/SRT - Name:None - Forced:False
ok I will extract Dutch, German and English subtitles. If no normal English subs then I will extract the SDH subtitles. Extract nothing in none of the above is present. Create a Video file wil a video stream, English, Dutch or German audio stream and no subtitle stream in the container.
Be aware that the stream identifiers are set by the creator of the video file and that they sometimes are sloppy or just use other names then the ISO identifiers.
Also that example you gave is very strange it has a an English sub but that is just a lsmall part of the movie and an SDH English sub for the whole movie but that SDH in actually a normal subtitle. No way that a script can anticipate on such strange configurations.
I don't want German 😀 That is indeed what sometimes is a bit irritating. That is why I throw away the Subs that have the name Forced as that is 99% of the time only the English translation for foreign speech. When using the SDH I can strip out all the HI and other stuff with Subtitle Edit and have two clean and perfect subtitles to my liking 😀.
Forced subtitles are use for situation where you watch a video without subtitles in for example English and if somebody is speaks a few line in French. Then for that French part they use a forced English subtitle so only that part is subtitled.
Correct, that is why I strip out the Forced sub, we like to have English subs for all the spoken parts.
Request to be added to Babelfish: https://github.com/Diaoul/babelfish
Describe the bug Most episodes these days have embedded subtitles which I extract as srt files but the language codes for these is most of the times in ISO 639-2 3 character. For English this works perfect. Files
Medusa
But as you can see the Dutch subtitle is not recognized while files that are formated with the 2 character language code are picked up. Files
Medusa
Medusa (please complete the following information):