morpheus65535 / bazarr

Bazarr is a companion application to Sonarr and Radarr. It manages and downloads subtitles based on your requirements. You define your preferences by TV show or movie and Bazarr takes care of everything for you.
https://www.bazarr.media
GNU General Public License v3.0
2.88k stars 224 forks source link

Replace guess_language lib with fattext #2048

Closed ngosang closed 1 year ago

ngosang commented 1 year ago

Long time ago I did some changes related to the library used to detect the subtitle language. => #800 It has been working fine but there is a new library developed by Facebook that is even better.

https://pypi.org/project/fasttext-predict/ => pip install fasttext-predict==0.9.2.1

import fasttext
# download the file from => https://fasttext.cc/docs/en/language-identification.html
model = fasttext.load_model('lid.176.ftz')

def detect_language(text):
    """
    you need to replace newlines
    fasttest returns => (('__label__fr',), (0.9959741830825806,))
    """
    lang = model.predict(text.replace("\n", " "))
    return lang[0][0].split('__')[2]

result = detect_language('Fondant au chocolat et tarte aux myrtilles')

I don't have time to do the work. Maybe you can find some time.

morpheus65535 commented 1 year ago

It seems interesting, I'll put that on my to do list but it could takes a while before I van work on that. Thanks for sharing this!

halali commented 1 year ago

@morpheus65535 FYI this lib need to be compiled

morpheus65535 commented 1 year ago

@halali yeah just saw that. Thanks for the feedback.

@ngosang I really don't want to add too many of those dependencies that can't be vendored. I guess that for now it's a no-go for this module. Sorry :-/

ngosang commented 1 year ago

@halali @morpheus65535 Yes, the library has to be compiled but they already provide compiled versions for most architectures. In your case will be similar to the compiled executables you are providing. You need to copy the "whl" file in your repository and install the required by the architecture. The packages are light, around 100kb.

https://pypi.org/project/fasttext-predict/#files

halali commented 1 year ago

Unfortunately this is not possible as whl is different for every python version and architecture