sc0ty / subsync

Subtitle Speech Synchronizer
GNU General Public License v3.0
1.28k stars 54 forks source link

Unknown POB language code for Brazilian Portugues #66

Closed dnmaia closed 4 years ago

dnmaia commented 4 years ago

i'm using Bazarr and the image "kocane/bazarr_subsync-sc0ty" which has the subsync built in, when runs in English subtitle it shows the message below

BAZARR Post-processing result for file /TV shows/Spin.City.(1996).Season.1-6.S01-S06.(480p.DVD.x265.HEVC.10bit.AAC.2.0.Silence)/Season 01/Spin City (1996) - S01E01 - Pilot (480p DVD x265 Silence).mkv : 
[*] starting synchronization /TV shows/Spin.City.(1996).Season.1-6.S01-S06.(480p.DVD.x265.HEVC.10bit.AAC.2.0.Silence)/Season 01/Spin City (1996) - S01E01 - Pilot (480p DVD x265 Silence).en.srt 
[+] synchronization 0%: 0 points 
[+] synchronization 91%: 90 points 
[+] synchronization 100%: 135 points 
[+] synchronization 100%: 160 points 
[+] saving to /TV shows/Spin.City.(1996).Season.1-6.S01-S06.(480p.DVD.x265.HEVC.10bit.AAC.2.0.Silence)/Season 01/Spin City (1996) - S01E01 - Pilot (480p DVD x265 Silence).en.srt 
[+] done

which looks ok, but when the subtitles is in portuguese, it shows this:

BAZARR Post-processing result for file /TV shows/Spin.City.(1996).Season.1-6.S01-S06.(480p.DVD.x265.HEVC.10bit.AAC.2.0.Silence)/Season 01/Spin City (1996) - S01E01 - Pilot (480p DVD x265 Silence).mkv : 
[*] starting synchronization /TV shows/Spin.City.(1996).Season.1-6.S01-S06.(480p.DVD.x265.HEVC.10bit.AAC.2.0.Silence)/Season 01/Spin City (1996) - S01E01 - Pilot (480p DVD x265 Silence).pt-BR.srt 
[+] updating asset list 
[!] there is no assets needed to perform synchronization 
[-] asset dictionary English / pob is missing

So, it is still syncronizing or i need to add this dictionary to work well, and if so, how can i do it?

dnmaia commented 4 years ago

I menage to figure it out what happened. the subsync try to download the dictionary, but it looks for eng-pob which in my case is Brazilian Portuguese. but it doesn't exist. exist eng-por, but i'm not sure if this is Portuguese from Portugal or the Brazilian Portuguese but the file is named different. i renamed on asset.json when the request is eng pob download the eng-por, then i find out the folder. just renamed the file to eng-pob and it worked, but not sure if i will have the problem if in this case the file is Portugues from Portugal.

dnmaia commented 4 years ago

another weird behavior, on my case, i menaged to add the dictionary like i mentioned, but, when runs subsync the CPU go to 100%. i'm using a synology DS918+. is that normal?

sc0ty commented 4 years ago

Your file has 'pob' language code which is invalid according to ISO-639-3, should be por. Is this code used in multimedia to mark Brazilian Portuguese? Or is this just this file? Maybe I should add it to subsync?

You could tell it which language it is manually by adding --ref-lang=por, see here.

I don't know the differences between Brazilian and Portugal Portugues, but if they similar enough, it should work with both dialects. Subsync works fine with low quality dictionaries. I think the version of this dictionary is for Portugal Portugues, but maybe has phrases from both. I've got it from Wikidict and IATE projects.

dnmaia commented 4 years ago

Well, i guess POR is for Portuguese from Portugal and POB to Portuguese from Brazil. There are some words that are completely different. a lot of systems use different tag for both portugueses.

sc0ty commented 4 years ago

another weird behavior, on my case, i menaged to add the dictionary like i mentioned, but, when runs subsync the CPU go to 100%. i'm using a synology DS918+. is that normal?

Yeah, by default it will spawn thread for every cpu core to finish faster. You could limit that with --jobs=2 flag.

Well, i guess POR is for Portuguese from Portugal and POB to Portuguese from Brazil. There are some words that are completely different. a lot of systems use different tag for both portugueses.

Ok, then I will add POB code in next release. Could you check if your Brazilian subtitles are synchronized correctly?

dnmaia commented 4 years ago

after use that dictionary the subtitles in Portuguese looks very well synchronized. it's just taking long time to do it on synology. i don't know if the Celeron CPU is the main issue, but, it's quad core. i will try to use you string sugestion. thanks. but if it's possible you could take a look on which version of your subsync it was used on that bazarr.

sc0ty commented 4 years ago

This bazar script is not mine, I didn't made it. Speech recognition is very cpu-intensive. You could try to synchronize with other subtitles instead of audio, which is much faster and usually yields better results.

dnmaia commented 4 years ago

to complement. the author of the bazarr with your subsync put the strig below to be used:

_subsync --cli sync --sub-lang '{{subtitles_languagecode3}}' --sub '{{subtitles}}' --ref '{{episode}}' --out '{{subtitles}}' --overwrite

Can you check if this is the best will to make it autommaticaly on bazarr?

dnmaia commented 4 years ago

This bazar script is not mine, I didn't made it. Speech recognition is very cpu-intensive. You could try to synchronize with other subtitles instead of audio, which is much faster and usually yields better results.

how would be the stricg to use another subtitles if it's present?

sc0ty commented 4 years ago

You could add --ref-lang similar as --sub-lang.

how would be the stricg to use another subtitles if it's present?

It will select subtitle stream automatically if there is any.

dnmaia commented 4 years ago

In this case, if there are no subtirles to use it will choose the audio?

sc0ty commented 4 years ago

Yes, it looks for subtitle stream first, then for audio. You could override this by selecting stream manually --ref-stream=number. And in the latest release there is also --ref-stream-by-type=sub/audio and --ref-stream-by-lang=code.

sc0ty commented 4 years ago

Updated the wiki page https://github.com/sc0ty/subsync/wiki/Command-line-options

sc0ty commented 4 years ago

done