guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
824 stars 91 forks source link

Guessit 3 doesn't work since rebulk 2 #615

Closed baderj closed 5 years ago

baderj commented 5 years ago

The following throws an exception on Python 3.7.3, Alpine Linux 3.10.2:

import guessit
guessit.guessit("Title (1-11-11)")

Here is the exception report:

===================== Guessit Exception Report =====================
version=3.0.4
string=Title (1-11-11)
options={'expected_title': ['OSS 117', 'This is Us'], 'allowed_countries': ['au', 'gb', 'us'], 'allowed_languages': ['ca', 'cs', 'de', 'en', 'es', 'fr', 'he', 'hi', 'hu', 'it', 'ja', 'ko', 'mul', 'nl', 'no', 'pl', 'pt', 'ro', 'ru', 'sv', 'te', 'uk', 'und'], 'advanced_config': {'common_words': ['ca', 'cat', 'de', 'he', 'it', 'no', 'por', 'rum', 'se', 'st', 'sub'], 'groups': {'starting': '([{', 'ending': ')]}'}, 'audio_codec': {'audio_channels': {'1.0': ['1ch', 'mono'], '2.0': ['2ch', 'stereo', 're:(2[\\W_]0(?:ch)?)(?=[^\\d]|$)'], '5.1': ['5ch', '6ch', 're:(5[\\W_][01](?:ch)?)(?=[^\\d]|$)', 're:(6[\\W_]0(?:ch)?)(?=[^\\d]|$)'], '7.1': ['7ch', '8ch', 're:(7[\\W_][01](?:ch)?)(?=[^\\d]|$)']}}, 'container': {'subtitles': ['srt', 'idx', 'sub', 'ssa', 'ass'], 'info': ['nfo'], 'videos': ['3g2', '3gp', '3gp2', 'asf', 'avi', 'divx', 'flv', 'iso', 'm4v', 'mk2', 'mk3d', 'mka', 'mkv', 'mov', 'mp4', 'mp4a', 'mpeg', 'mpg', 'ogg', 'ogm', 'ogv', 'qt', 'ra', 'ram', 'rm', 'ts', 'vob', 'wav', 'webm', 'wma', 'wmv'], 'torrent': ['torrent'], 'nzb': ['nzb']}, 'country': {'synonyms': {'ES': ['españa'], 'GB': ['UK'], 'BR': ['brazilian', 'bra'], 'CA': ['québec', 'quebec', 'qc'], 'MX': ['Latinoamérica', 'latin america']}}, 'episodes': {'season_max_range': 100, 'episode_max_range': 100, 'max_range_gap': 1, 'season_markers': ['s'], 'season_ep_markers': ['x'], 'disc_markers': ['d'], 'episode_markers': ['xe', 'ex', 'ep', 'e', 'x'], 'range_separators': ['-', '~', 'to', 'a'], 'discrete_separators': ['+', '&', 'and', 'et'], 'season_words': ['season', 'saison', 'seizoen', 'seasons', 'saisons', 'tem', 'temp', 'temporada', 'temporadas', 'stagione'], 'episode_words': ['episode', 'episodes', 'eps', 'ep', 'episodio', 'episodios', 'capitulo', 'capitulos'], 'of_words': ['of', 'sur'], 'all_words': ['All']}, 'language': {'synonyms': {'ell': ['gr', 'greek'], 'spa': ['esp', 'español', 'espanol'], 'fra': ['français', 'vf', 'vff', 'vfi', 'vfq'], 'swe': ['se'], 'por_BR': ['po', 'pb', 'pob', 'ptbr', 'br', 'brazilian'], 'deu_CH': ['swissgerman', 'swiss german'], 'nld_BE': ['flemish'], 'cat': ['català', 'castellano', 'espanol castellano', 'español castellano'], 'ces': ['cz'], 'ukr': ['ua'], 'zho': ['cn'], 'jpn': ['jp'], 'hrv': ['scr'], 'mul': ['multi', 'dl']}, 'subtitle_affixes': ['sub', 'subs', 'esub', 'esubs', 'subbed', 'custom subbed', 'custom subs', 'custom sub', 'customsubbed', 'customsubs', 'customsub', 'soft subtitles', 'soft subs'], 'subtitle_prefixes': ['st', 'v', 'vost', 'subforced', 'fansub', 'hardsub', 'legenda', 'legendas', 'legendado', 'subtitulado', 'soft', 'subtitles'], 'subtitle_suffixes': ['subforced', 'fansub', 'hardsub'], 'language_affixes': ['dublado', 'dubbed', 'dub'], 'language_prefixes': ['true'], 'language_suffixes': ['audio'], 'weak_affixes': ['v', 'audio', 'true']}, 'part': {'prefixes': ['pt', 'part']}, 'release_group': {'forbidden_names': ['bonus', 'by', 'for', 'par', 'pour', 'rip'], 'ignored_seps': '[]{}()'}, 'screen_size': {'frame_rates': ['23.976', '24', '25', '29.970', '30', '48', '50', '60', '120'], 'min_ar': 1.333, 'max_ar': 1.898, 'interlaced': ['360', '480', '576', '900', '1080'], 'progressive': ['360', '480', '540', '576', '900', '1080', '368', '720', '1440', '2160', '4320']}, 'website': {'safe_tlds': ['com', 'net', 'org'], 'safe_subdomains': ['www'], 'safe_prefixes': ['co', 'com', 'net', 'org'], 'prefixes': ['from']}, 'streaming_service': {'A&E': ['AE', 'A&E'], 'ABC': 'AMBC', 'ABC Australia': 'AUBC', 'Al Jazeera English': 'AJAZ', 'AMC': 'AMC', 'Amazon Prime': ['AMZN', 'Amazon', 're:Amazon-?Prime'], 'Adult Swim': ['AS', 're:Adult-?Swim'], "America's Test Kitchen": 'ATK', 'Animal Planet': 'ANPL', 'AnimeLab': 'ANLB', 'AOL': 'AOL', 'ARD': 'ARD', 'BBC iPlayer': ['iP', 're:BBC-?iPlayer'], 'BravoTV': 'BRAV', 'Canal+': 'CNLP', 'Cartoon Network': 'CN', 'CBC': 'CBC', 'CBS': 'CBS', 'CNBC': 'CNBC', 'Comedy Central': ['CC', 're:Comedy-?Central'], 'Channel 4': '4OD', 'CHRGD': 'CHGD', 'Cinemax': 'CMAX', 'Country Music Television': 'CMT', 'Comedians in Cars Getting Coffee': 'CCGC', 'Crunchy Roll': ['CR', 're:Crunchy-?Roll'], 'Crackle': 'CRKL', 'CSpan': 'CSPN', 'CTV': 'CTV', 'CuriosityStream': 'CUR', 'CWSeed': 'CWS', 'Daisuki': 'DSKI', 'DC Universe': 'DCU', 'Deadhouse Films': 'DHF', 'DramaFever': ['DF', 'DramaFever'], 'Digiturk Diledigin Yerde': 'DDY', 'Discovery': ['DISC', 'Discovery'], 'Disney': ['DSNY', 'Disney'], 'DIY Network': 'DIY', 'Doc Club': 'DOCC', 'DPlay': 'DPLY', 'E!': 'ETV', 'ePix': 'EPIX', 'El Trece': 'ETTV', 'ESPN': 'ESPN', 'Esquire': 'ESQ', 'Family': 'FAM', 'Family Jr': 'FJR', 'Food Network': 'FOOD', 'Fox': 'FOX', 'Freeform': 'FREE', 'FYI Network': 'FYI', 'Global': 'GLBL', 'GloboSat Play': 'GLOB', 'Hallmark': 'HLMK', 'HBO Go': ['HBO', 're:HBO-?Go'], 'HGTV': 'HGTV', 'History': ['HIST', 'History'], 'Hulu': 'HULU', 'Investigation Discovery': 'ID', 'IFC': 'IFC', 'iTunes': 'iTunes', 'ITV': 'ITV', 'Knowledge Network': 'KNOW', 'Lifetime': 'LIFE', 'Motor Trend OnDemand': 'MTOD', 'MBC': ['MBC', 'MBCVOD'], 'MSNBC': 'MNBC', 'MTV': 'MTV', 'National Geographic': ['NATG', 're:National-?Geographic'], 'NBA TV': ['NBA', 're:NBA-?TV'], 'NBC': 'NBC', 'Netflix': ['NF', 'Netflix'], 'NFL': 'NFL', 'NFL Now': 'NFLN', 'NHL GameCenter': 'GC', 'Nickelodeon': ['NICK', 'Nickelodeon'], 'Norsk Rikskringkasting': 'NRK', 'OnDemandKorea': ['ODK', 'OnDemandKorea'], 'PBS': 'PBS', 'PBS Kids': 'PBSK', 'Playstation Network': 'PSN', 'Pluzz': 'PLUZ', 'RTE One': 'RTE', 'SBS (AU)': 'SBS', 'SeeSo': ['SESO', 'SeeSo'], 'Shomi': 'SHMI', 'Spike': 'SPIK', 'Spike TV': ['SPKE', 're:Spike-?TV'], 'Sportsnet': 'SNET', 'Sprout': 'SPRT', 'Stan': 'STAN', 'Starz': 'STZ', 'Sveriges Television': 'SVT', 'SwearNet': 'SWER', 'Syfy': 'SYFY', 'TBS': 'TBS', 'TFou': 'TFOU', 'The CW': ['CW', 're:The-?CW'], 'TLC': 'TLC', 'TubiTV': 'TUBI', 'TV3 Ireland': 'TV3', 'TV4 Sweeden': 'TV4', 'TVING': 'TVING', 'TV Land': ['TVL', 're:TV-?Land'], 'UFC': 'UFC', 'UKTV': 'UKTV', 'Univision': 'UNIV', 'USA Network': 'USAN', 'Velocity': 'VLCT', 'VH1': 'VH1', 'Viceland': 'VICE', 'Viki': 'VIKI', 'Vimeo': 'VMEO', 'VRV': 'VRV', 'W Network': 'WNET', 'WatchMe': 'WME', 'WWE Network': 'WWEN', 'Xbox Video': 'XBOX', 'Yahoo': 'YHOO', 'YouTube Red': 'RED', 'ZDF': 'ZDF'}}}
--------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/guessit/api.py", line 210, in guessit
    matches = self.rebulk.matches(string, options)
  File "/usr/lib/python3.7/site-packages/rebulk/rebulk.py", line 113, in matches
    self._matches_patterns(matches, context)
  File "/usr/lib/python3.7/site-packages/rebulk/rebulk.py", line 176, in _matches_patterns
    pattern_matches = pattern.matches(matches.input_string, context)
  File "/usr/lib/python3.7/site-packages/rebulk/pattern.py", line 166, in matches
    for match in self._match(pattern, input_string, context):
  File "/usr/lib/python3.7/site-packages/rebulk/chain.py", line 81, in _match
    input_string, chain_input_string, offset, current_chain_matches)
  File "/usr/lib/python3.7/site-packages/rebulk/chain.py", line 111, in _to_next_chain_part
    if self._chain_breaker_eval(current_chain_matches + grouped_matches):
  File "/usr/lib/python3.7/site-packages/rebulk/chain.py", line 170, in _chain_breaker_eval
    return not self.chain_breaker or not self.chain_breaker(Matches(matches))
  File "/usr/lib/python3.7/site-packages/guessit/rules/properties/episodes.py", line 52, in episodes_season_chain_breaker
    if len(eps) > 1 and abs(eps[-1].value - eps[-2].value) > episode_max_range:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
--------------------------------------------------------------------
Please report at https://github.com/guessit-io/guessit/issues.
====================================================================
baderj commented 5 years ago

Seems to be an issue with Alpine, many names are not parsed correctly:

▶ docker run -it python:3.6.9-alpine3.9 /bin/sh
/ # pip3 install guessit
<snip>
Successfully installed babelfish-0.5.5 guessit-3.0.4 python-dateutil-2.8.0 rebulk-2.0.0 six-1.12.0
/ # python3
Python 3.6.9 (default, Jul 13 2019, 15:23:04) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guessit
>>> guessit.guessit("Last Week S01E02")
MatchesDict([('title', 'Last Week S01E02'), ('type', 'movie')])
baderj commented 5 years ago

I think I found the issue: GuessIt does not work with Rebulk 2. On Rebulk 1.0.1 everything is peachy:

▶  guessit --version                                              
+-------------------------------------------------------+
+                   GuessIt 3.0.4                       +
+-------------------------------------------------------+
+                   Rebulk 1.0.1                        +
+-------------------------------------------------------+
|      Please report any bug or feature request at      |
|     https://github.com/guessit-io/guessit/issues.     |
+-------------------------------------------------------+
▶  guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi"
For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi
GuessIt found: {
    "title": "Treme",
    "season": 1,
    "episode": 3,
    "episode_title": "Right Place, Wrong Time",
    "source": "HDTV",
    "video_codec": "Xvid",
    "release_group": "NoTV",
    "container": "avi",
    "mimetype": "video/x-msvideo",
    "type": "episode"
}

But after upgrading to 2.0.0, guessit fails:

▶ sudo pip3 install rebulk --upgrade
Collecting rebulk
  Downloading https://files.pythonhosted.org/packages/ad/f6/3b27f7399ac8486d86e239e0a44acacfd0e0a3e5903071420c0b0cf8b465/rebulk-2.0.0.tar.gz (257kB)
    100% |████████████████████████████████| 266kB 2.1MB/s 
Requirement already up-to-date: six in /home/xxx/.local/lib/python3.6/site-packages (from rebulk)
Installing collected packages: rebulk
  Found existing installation: rebulk 1.0.1
    Uninstalling rebulk-1.0.1:
      Successfully uninstalled rebulk-1.0.1
  Running setup.py install for rebulk ... done
Successfully installed rebulk-2.0.0

▶ guessit --version                 
+-------------------------------------------------------+
+                   GuessIt 3.0.4                       +
+-------------------------------------------------------+
+                   Rebulk 2.0.0                        +
+-------------------------------------------------------+
|      Please report any bug or feature request at      |
|     https://github.com/guessit-io/guessit/issues.     |
+-------------------------------------------------------+

▶ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi"

For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi
GuessIt found: {
    "title": "Treme 1x03 Right Place, Wrong Time",
    "source": "HDTV",
    "video_codec": "Xvid",
    "release_group": "NoTV",
    "container": "avi",
    "mimetype": "video/x-msvideo",
    "type": "movie"
}
Toilal commented 5 years ago

I need to release guessit 3.0.5 with Rebulk version locked to <2.

Toilal commented 5 years ago

Should be fixed with 3.0.5 release