guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
824 stars 91 forks source link

Episode title not parsed when file is a repack #775

Open noxxusnx opened 6 months ago

noxxusnx commented 6 months ago

When a filename contains "REPACK" it adds "Proper" to the tags, then for some reason no longer parses the episode title.

You can replicate with setting this up:

from guessit import guessit

file_name_original = "Total.Forgiveness.S00E01.Grant.OBrien.Its.All.Love.Full.Stand.up.Set.1080p.DRPO.WEBRip.x265.AAC.2.0-NOXXUS.mkv"
file_name_repack = "Total.Forgiveness.S00E01.REPACK.Grant.OBrien.Its.All.Love.Full.Stand.up.Set.1080p.DRPO.WEBRip.x265.AAC.2.0-NOXXUS.mkv"

Then calling guessit(file_name_original) returns:

MatchesDict({'title': 'Total Forgiveness', 'season': 0, 'episode': 1, 'episode_title': 'Grant OBrien Its All Love Full Stand up Set', 'screen_size': '1080p', 'source': 'Web', 'other': 'Rip', 'video_codec': 'H.265', 'audio_codec': 'AAC', 'audio_channels': '2.0', 'release_group': 'NOXXUS', 'container': 'mkv', 'mimetype': 'video/x-matroska', 'type': 'episode'})

Then calling guessit(file_name_repack) returns:

MatchesDict({'title': 'Total Forgiveness', 'season': 0, 'episode': 1, 'other': ['Proper', 'Rip'], 'proper_count': 1, 'screen_size': '1080p', 'source': 'Web', 'video_codec': 'H.265', 'audio_codec': 'AAC', 'audio_channels': '2.0', 'release_group': 'NOXXUS', 'container': 'mkv', 'mimetype': 'video/x-matroska', 'type': 'episode'})

Noticeably no longer returning a value for episode_title

noxxusnx commented 2 months ago

I haven't found the cause but adding to guessit/test/rules/episodes.yml

? Some.Series.S00E01.Some.Episode.Title.1080p.WEBRip.x265.AAC.2.0-SOMEGROUP.mkv
: title: Some Series
  episode_title: Some Episode Title

? Some.Series.S00E01.REPACK.Some.Episode.Title.1080p.WEBRip.x265.AAC.2.0-SOMEGROUP.mkv
: title: Some Series
  episode_title: Some Episode Title
  proper_count: 1

Replicates the issue in tests, the first one passes and the 2nd one is missing the episode_title

noxxusnx commented 2 months ago

Still trying to narrow things down, but with breakpoints I've found that in the EpisodeTitleFromPosition rule the hole_filter method seems to be where the problem happens. In the working test the episode variable contains <S00E01:(12, 18)+private+tags=['SxxExx']> but in the not working test it doesn't contain anything.