guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
824 stars 91 forks source link

future year 2049 in movie title should not be parsed as season 20, episode 49 #774

Open milahu opened 6 months ago

milahu commented 6 months ago

good title: has movie year 2017

Blade.Runner.2049.2017.720p.BluRay.x264-[YTS.AG].mp4
>>> print(json.dumps(guessit.guessit("Blade.Runner.2049.2017.720p.BluRay.x264-[YTS.AG].mp4"), indent=2))
{
  "title": "Blade Runner 2049",
  "year": 2017,
  "screen_size": "720p",
  "source": "Blu-ray",
  "video_codec": "H.264",
  "release_group": "YTS.AG",
  "container": "mp4",
  "mimetype": "video/mp4",
  "type": "movie"
}

bad title: movie year 2017 is missing

Blade Runner 2049.HDRip.XviD.AC3-EVO.avi

guessit 3.8.0 confuses 2049 with season 20, episode 49

>>> print(json.dumps(guessit.guessit("Blade Runner 2049.HDRip.XviD.AC3-EVO.avi"), indent=2))
{
  "title": "Blade Runner",
  "season": 20,
  "episode": 49,
  "other": [
    "HD",
    "Rip"
  ],
  "video_codec": "Xvid",
  "audio_codec": "Dolby Digital",
  "release_group": "EVO",
  "container": "avi",
  "mimetype": "video/x-msvideo",
  "type": "episode"
}

maybe we could give guessit hints like "expect a movie" "expect a release year before 2020"

in the future, after year 2049, guessit will parse 2049 as movie year... this issue could be solved by lookup in the imdb database

$ du -sh imdb/*
762M    imdb/title.basics.db
164M    imdb/title.basics.tsv.gz
125M    imdb/title.episode.db
39M     imdb/title.episode.tsv.gz

but... imdb also has tv-show episodes called Blade Runner 2049

$ zgrep $'\tBlade Runner 2049\t' imdb/title.basics.tsv.gz
tt12038118  tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2020    \N  \N  Comedy
tt13350998  tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  Talk-Show
tt17075118  tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2022    \N  \N  Talk-Show
tt1856101   movie   Blade Runner 2049   Blade Runner 2049   0   2017    \N  164 Action,Drama,Mystery
tt7465768   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  Comedy,Talk-Show
tt7473474   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  32  Comedy,Talk-Show
tt7479818   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  29  Comedy,Talk-Show
tt7481170   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  Comedy
tt7598018   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  2   Talk-Show
tt7608378   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  News
tt7909368   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  Talk-Show
tt8585120   tvEpisode   Blade Runner 2049   Blade Runner 2049   0   2017    \N  \N  Talk-Show