platelminto / parse-torrent-title

Extract media information from a torrent-like filename
MIT License
84 stars 13 forks source link
media-library torrent

parse-torrent-title

Extract media information from torrent-like filename

Python versions

Originally based off of this JavaScript library.

Extract all possible media information from a filename. Multiple regex rules are applied on the filename, each of which extracts appropriate information. If a rule matches, the corresponding part is removed from the filename. Finally, what remains is taken as the title of the content.

Install

PTN can be installed automatically using pip.

$ pip install parse-torrent-title

Requirements

Requirements are optional. That being said, the regex library increases performance on Python 2 by more than 10x, so it might be worth installing with:

$ pip install -r requirements.txt

With Python 3, the default re module is faster than regex, so it will always be used regardless of installed requirements.

Why?

Online APIs by providers like TMDb, TVDb, and OMDb don't react well to queries which include any kind of extra information. To get proper results from these APIs, only the title of the content should be provided in the search query. The accuracy of the results can be improved by passing in the year, which can also be extracted using this library.

Examples

Movies, series (seasons & episodes), and subtitles can be parsed. All meaningful information is extracted and returned in a dictionary. Text which couldn't be parsed is returned in the excess field.

import PTN

PTN.parse('The Walking Dead S05E03 720p HDTV x264-ASAP[ettv]')
# {
#     'encoder': 'ASAP',
#     'title': 'The Walking Dead',
#     'season':  5,
#     'episode': 3,
#     'resolution': '720p',
#     'codec': 'H.264',
#     'quality': 'HDTV',
#     'website': 'ettv'
# }

PTN.parse('Vacancy (2007) 720p Bluray Dual Audio [Hindi + English] ⭐800 MB⭐ DD - 2.0 MSub x264 - Shadow (BonsaiHD)')
# {
#     'encoder': 'Shadow',
#     'title': 'Vacancy',
#     'resolution': '720p',
#     'codec': 'H.264',
#     'year':  2007,
#     'audio': 'Dolby Digital 2.0',
#     'quality': 'Blu-ray',
#     'language': ['Hindi', 'English'],
#     'subtitles': 'Available',
#     'size': 800MB,
#     'website': BonsaiHD
#     'excess': '⭐⭐'
# }

PTN.parse('Deadliest.Catch.S00E66.No.Safe.Passage.720p.AMZN.WEB-DL.DDP2.0.H.264-NTb[TGx]')
# {
#     'encoder': 'NTb',
#     'title': 'Deadliest Catch',
#     'resolution': '720p',
#     'codec': 'H.264',
#     'audio' : 'Dolby Digital Plus 2.0',
#     'network': 'Amazon Studios',
#     'season':  0,
#     'episode': 66,
#     'quality': 'WEB-DL',
#     'episodeName': 'No Safe Passage',
#     'website': 'TGx'
# }

PTN.parse('Insecure.S04.COMPLETE.720p.AMZN.WEBRip.x264-GalaxyTV')
# {
#     'title': 'Insecure'
#     'encoder': 'GalaxyTV',
#     'codec': 'H.264',
#     'season': 4,
#     'resolution': '720p',
#     'network': 'Amazon Studios',
#     'quality': 'WEBRip',
# }

More examples (inputs and outputs) can be found looking through tests/files.

CLI

You can use PTN from your command line, where the output will be printed as JSON:

$ python cli.py 'Insecure.S04.COMPLETE.720p.AMZN.WEBRip.x264-GalaxyTV'

 {
     'title': 'Insecure'
     'encoder': 'GalaxyTV',
     'codec': 'H.264',
     'season': 4,
     'resolution': '720p',
     'network': 'Amazon Studios',
     'quality': 'WEBRip',
 }

For help, use the -h or --help flag:

$ python cli.py --help

This will provide a brief overview of the available options and their usage.

Raw info

The matches in the torrent name are standardised into specific strings, according to scene rules where possible - 'WEBDL', 'WEB DL', and 'HDRip' are all converted to 'WEB-DL', for example. 'DDP51' becomes 'Dolby Digital Plus 5.1'. ['ita', 'eng'] becomes ['Italian', 'English'].To disable this, and return just what was matched in the torrent, run:

PTN.parse('A freakishly cool movie or TV episode', standardise=False)

In the CLI, you can use the --raw flag:

$ python cli.py --raw 'A freakishly cool movie or TV episode'

Types of parts

The types of parts can be strings, integers, booleans, or lists of the first 2. To simplify this, you can enable the coherent_types flag. This will override the types described below according to these rules:

To enable this flag:

PTN.parse('An even cooler movie or TV episode', coherent_types=True)

In the CLI, you can use the --coherent-types flag:

$ python cli.py --coherent-types 'A freakishly cool movie or TV episode'

Parts extracted

Contributing

Submit a PR on the dev branch, including tests for what gets newly matched (if applicable), having run the pre-commit hooks. Add the titles you want to add to the tests in tests/test_generator's main method (in add_titles()), it will automatically add what's needed to files/input.json, files/output_raw.json, and files/output_standard.json. The fields encoder, excess, site, and episodeName don't always have to be correct - if they're giving you issues, or seem wrong, feel free to remove them from the output test files.

(What it does: add_titles() adds input torrent names to tests/files/input.json and full output json objects (with standardise=False) to tests/files/output_raw.json. It also adds the standardised output to tests/files/output_standard.json, only including fields that are changed, along with title.)

Additions to parse-torrent-name

Below are the additions that have been made to /u/divijbindlish's original repo, including other contributors' work. parse-torrent-title was initially forked from here, but a lot of extra work has been done since, and given that the original repo is inactive, it was unforked.

Updates on top of /u/roidayan's work

/u/roidayan's work on top of the original

License

MIT © 2015-2017 Divij Bindlish

MIT © 2020 Giorgio Momigliano