guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
814 stars 92 forks source link

Dashes in filenames break title detection? #571

Closed pannal closed 5 years ago

pannal commented 5 years ago

Somewhat similar to #537

Solo - A Star Wars Story (2018) [1080P BLURAY H264 YTS.AM].mp4

GuessIt found: {
    "title": "Solo",
    "alternative_title": "A Star Wars Story",
    "year": 2018,
    "screen_size": "1080p",
    "source": "Blu-ray",
    "video_codec": "H.264",
    "release_group": "YTS.AM",
    "container": "mp4",
    "mimetype": "video/mp4",
    "type": "movie"
}

Expected result:

GuessIt found: {
    "title": "Solo - A Star Wars Story",
    "year": 2018,
    "screen_size": "1080p",
    "source": "Blu-ray",
    "video_codec": "H.264",
    "release_group": "YTS.AM",
    "container": "mp4",
    "mimetype": "video/mp4",
    "type": "movie"
}

This is the same for guessit 2 and 3.

Toilal commented 5 years ago

This is the expected result, It has already been discussed in some issues.

The dash is a splitting separator in guessit, so titles are splitted on those token. Official titles most often use : as separator in this case (see Solo IMDB page)

For: Solo: A Star Wars Story (2018) [1080P BLURAY H264 YTS.AM].mp4
GuessIt found: {
    "title": "Solo: A Star Wars Story",
    "year": 2018,
    "screen_size": "1080p",
    "source": "Blu-ray",
    "video_codec": "H.264",
    "release_group": "YTS.AM",
    "container": "mp4",
    "mimetype": "video/mp4",
    "type": "movie"
}

You can still append title and alternative_title to handle those case in your application, but it may fail in some other cases.