guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
824 stars 92 forks source link

Incomplete title for tv show "the last of us" #739

Open gargolito opened 1 year ago

gargolito commented 1 year ago

guessit The.Last.of.Us.S01E01 For: The.Last.of.Us.S01E01

GuessIt found: {
    "title": "The Last of",
    "country": "UNITED STATES",
    "season": 1,
    "episode": 1,
    "type": "episode"
}
Toilal commented 1 year ago

This issue has been discussed many times with other shows/movies ending with us word.

I'll let the issue open though as I also want this to be fixed. Maybe we could consider US as a country only if it's uppercase, when it's not bounded by other matches.

gargolito commented 1 year ago

I have another weird thing. There was an episode of Ted Lasso filename had this in the name with season, episode and number: S03E03.4.5.1 guessit thought it was a multi-episode. I can see why. I haven't looked at your code yet but are you using any kind of NLP like spacy? That might help with calculating distance between words.


guessit Ted.Lasso.S03E03.4-5-1.mkv
For: Ted.Lasso.S03E03.4-5-1.mkv
GuessIt found: {
    "title": "Ted Lasso",
    "season": 3,
    "episode": [
        3,
        4,
        5
    ],
    "episode_title": "1",
    "screen_size": "1080p",
    "streaming_service": "AppleTV",
    "source": "Web",
    "audio_codec": "Dolby Digital Plus",
    "audio_channels": "5.1",
    "video_codec": "H.264",
    "release_group": "NTb",
    "container": "mkv",
    "mimetype": "video/x-matroska",
    "type": "episode"
}
`
Toilal commented 1 year ago

I haven't looked at your code yet but are you using any kind of NLP like spacy?

No, it's just a big bunch of regexp and rules to solve conflicts between matches. I'm pretty sure some IA based algorithm could perform nicely for parsing movies/series filenames, but guessit is not based on any of those.

VeNoMouS commented 1 year ago

This issue has been discussed many times with other shows/movies ending with us word.

I'll let the issue open though as I also want this to be fixed. Maybe we could consider US as a country only if it's uppercase, when it's not bounded by other matches.

Hi again @Toilal , are you going to implement this? i'm in favor of this