platelminto / parse-torrent-title

Extract media information from a torrent-like filename
MIT License
86 stars 13 forks source link

Handle movie collections #20

Open platelminto opened 4 years ago

platelminto commented 4 years ago

Movie trilogies/collections often have releases containing all the movies. They tend to have the word 'collection' or similar at the end of the title, and have a year range. Examples:

Without a year range, it might be difficult to know whether it is a collection (the word could just be in the title). Maybe we first look for an indicator it is a collection ("collection", "trilogy"), then find a confirmation (either a year range, or a description of the collection e.g. "4 Film", "1-3", etc.). This might miss some, but we don't want to mess normal titles up. Maybe this could be specified as a type to the parse() function itself, implementing #1, so not messing up non-specified searches.

This would add a boolean field collection, and a way to list what that collection has - collectionContents? Which would be an integer list most likely - the Harry Potter examples above just say "Complete"/"All Movies" but if that isn't there, it's implied it means all of them (like the Maze Runner example), so it'd just be a true for collection and no collectionContents. This field seems like a pretty niche thing to actually have to use anyway - the boolean should be enough when you know you're looking for a collection.

Not sure how to handle the year range. Currently the last year found would be year, so in the first example, this would be 2018. Could add a collectionStart field? Maybe duplicate year into a collectionEnd field for completeness, still leaving year in?

platelminto commented 4 years ago

More examples:

platelminto commented 11 months ago

More examples: