guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
816 stars 92 forks source link

overly eager matching of "cd" #742

Open jcfp opened 1 year ago

jcfp commented 1 year ago

hi,

a recent PR of mine (https://github.com/sabnzbd/sabnzbd/pull/2461) had occasional test failures. The function tested uses guessit to detect cd and part in filenames, and for this test was supplied with filenames (also) containing a random hexadecimal string.

It turned out guessit was very eager to match the cd property even in the middle of such strings. The part property on the other hand behaves as expected, i.e. it is only matched if surrounded by some form of spacing or delimiter:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guessit
>>> guessit.__version__
'3.7.1'
>>> guessit.api.guessit("My File 238ddcd5aff.mkv")
MatchesDict([('title', 'My File 238dd'), ('cd', 5), ('container', 'mkv'), ('mimetype', 'video/x-matroska'), ('type', 'movie')])
>>> guessit.api.guessit("My File 238ddpart5aff.mkv")
MatchesDict([('title', 'My File 238ddpart5aff'), ('container', 'mkv'), ('mimetype', 'video/x-matroska'), ('type', 'movie')])
>>> guessit.api.guessit("My File 238dd(part5)aff.mkv")
MatchesDict([('title', 'My File 238dd'), ('part', 5), ('container', 'mkv'), ('mimetype', 'video/x-matroska'), ('type', 'movie')])