guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
814 stars 92 forks source link

Guessit name/language problem #660

Open Entixs opened 3 years ago

Entixs commented 3 years ago

Guessit using having a problem with the anime title Hi Score Girl https://www.thetvdb.com/series/high-score-girl

It is labelling Hi as the language for Hindi which is labelled in the log for Medusa program.

2020-10-12 19:25:39 DEBUG FORCEDSEARCHQUEUE-MANUAL-346673 :: [AnimeBytes] :: [e194cb2] Error during parsing of release name: HI.SCORE.GIRL.II.S01E01-E09.Blu-ray.MKV.h264.1080p.FLAC2.0-SonicBoom, with error: Unable to match HI.SCORE.GIRL.II.S01E01-E09.Blu-ray.MKV.h264.1080p.FLAC2.0-SonicBoom to a series in your database. Parser result: language: hi, title: SCORE GIRL II, season: 1, episode: [1, 2, 3, 4, 5, 6, 7, 8, 9], source: Blu-ray, container: mkv, video_codec: H.264, screen_size: 1080p, audio_codec: FLAC, audio_channels: 2.0, release_group: SonicBoom, type: episode, parsing_time: 0.37002110481262207, absolute_episode: [], quality: 1080p BluRay, total_time: 0.4190239906311035

Can this be fixed? Thank you!

Toilal commented 3 years ago

You can configure the list of allowed languages (https://github.com/guessit-io/guessit/blob/6e4ead187ef98f405487301b7ddc89cb5461ac5d/guessit/config/options.json#L11).

See https://guessit.readthedocs.io/en/latest/configuration.html#configuration

I have to admin that "Hi" should not be in this list by default, as it's quite a common word in english.

p0psicles commented 3 years ago

I checked. But this is our current allowed_languages array: allowed_languages = [ 'de', 'en', 'es', 'ca', 'fr', 'he', 'hu', 'it', 'jp', 'nl', 'pl', 'pt', 'ro', 'ru', 'sv', 'uk', 'mul', # multi language 'und', # undetermined ]

I also disabled our rebulk rules to be sure. But the allowed_languages doesn't seem to have any effect.