vikstrous / pirate-get

A command line interface for The Pirate Bay
GNU Affero General Public License v3.0
337 stars 53 forks source link

"NoneType" exception on search - API change required? #91

Closed ChrisTimperley closed 7 years ago

ChrisTimperley commented 8 years ago

This looks like a sweet tool, but I'm having a few problems trying to connect to TPB. I've tried various proxies in the config file, all to no avail. I get the following error upon connecting.

λ ~/ pirate-get -b 
Trying https://piratebaymirror.eu/... Traceback (most recent call last):
  File "/home/chris/.local/bin/pirate-get", line 11, in <module>
    sys.exit(main())
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/pirate.py", line 416, in main
    pirate_main(args)
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/pirate.py", line 319, in pirate_main
    results, site = search_mirrors(printer, args)
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/pirate.py", line 256, in search_mirrors
    result = connect_mirror(mirror, printer, args)
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/pirate.py", line 244, in connect_mirror
    mirror=mirror)
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/torrent.py", line 147, in remote
    res_l += parse_page(res)
  File "/home/chris/.local/lib/python3.5/site-packages/pirate/torrent.py", line 106, in parse_page
    tag['href'].startswith('magnet'))['href']
TypeError: 'NoneType' object is not subscriptable

λ ~/ pirate-get stuff
same result!

Running Elementary 0.4 (Loki) and Python 3.5.

Update: appears to be fetching and parsing the HTML for a given page correctly. Problem is encountered somewhere in the row.find(...) command. Adding a simple print(row) at torrent.py:106 yields:

<center>
<a href="/browse/200" title="More from this category">Video</a><br>
                (<a href="/browse/207" title="More from this category">HD - Movies</a>)
            </br></center>
</td>
<td>
<div class="detName"> <a class="detLink" href="/torrent/15842962/War_Dogs_2016_720p_BrRip_x264_-_FoRM" title="Details for War Dogs 2016 720p BrRip x264 - FoRM">War Dogs 2016 720p BrRip x264 - FoRM</a>
</div>
<a href="/torrent/15842962/War_Dogs_2016_720p_BrRip_x264_-_FoRM" title="Download this torrent using magnet"><img alt="Magnet link" src="/static/img/icon-magnet.gif"/></a><img alt="This torrent has 2 comments." src="/static/img/icon_comment.gif" title="This torrent has 2 comments."><img src="/static/img/11x11p.png"><font class="detDesc">Uploaded Y-day 22:32, Size 760.99 MiB, ULed by <a class="detDesc" href="/user/MegaBanana/" title="Browse MegaBanana">MegaBanana</a></font>
</img></img></td>
<td align="right">10312</td>
<td align="right">1235</td>
</tr>

In the example above, the magnet link begins with "/torrent", rather than "magnet". Perhaps this line needs to be changed to reflect a change in the HTML output?

Update 2: after modifying torrent.py:106 to reflect this change...

tag['title'] == 'Download this torrent using magnet')['href']

we successfully connect to the proxy, but fail at print.py:65

λ ~/git/ pirate-get cool                 
Trying https://piratebaymirror.eu/... Ok 
Traceback (most recent call last):
  File "/usr/local/bin/pirate-get", line 9, in <module>
    load_entry_point('pirate-get==0.2.9', 'console_scripts', 'pirate-get')()
  File "/usr/local/lib/python3.5/dist-packages/pirate_get-0.2.9-py3.5.egg/pirate/pirate.py", line 416, in main
    pirate_main(args)
  File "/usr/local/lib/python3.5/dist-packages/pirate_get-0.2.9-py3.5.egg/pirate/pirate.py", line 330, in pirate_main
    printer.search_results(results, local=args.source == 'local_tpb')
  File "/usr/local/lib/python3.5/dist-packages/pirate_get-0.2.9-py3.5.egg/pirate/print.py", line 65, in search_results
    torrent_name = parse.unquote_plus(name.group(1))
AttributeError: 'NoneType' object has no attribute 'group'

Update 3: The bug is still at torrent.py:106, and stems from a change in the reporting of magnet links. Now the link takes you to the HTML page for that particular torrent, rather than giving you a nice magnet link. You could add another step to go and fetch the magnet link from that URL (but that's adding another HTTP request per entry, which isn't going to be nice).

rnhmjoj commented 8 years ago

There are a few tpb mirrors that don't have the magnet URI in the search results: the magnet icon is just a link to the torrent page (no idea why they are doing this). This breaks pirate-get entirely: the exception could be handled but it's not recoverable anyway so we opted to blacklist these sites. We could in theory make another HTTP request and fetch the magnet but this will make the search way slower, especially if you get a lot of results. So piratebaymirror.eu should be added to pirate/data/blacklist.json

ChrisTimperley commented 8 years ago

Ah, I see. That makes sense. Thanks for the quick reply! Maybe add an exception to check for the magnet link and just print "incompatible proxy detected"? Also, could the tool potentially go and fetch a (whitelisted) proxy automatically? (maybe using https://thepiratebay-proxylist.org/api/v1/proxies).

Edit: I switched over to https://pirateproxy.red and now it works a treat! :+1:

rnhmjoj commented 8 years ago

The would be the best thing to do but it would also make testing the mirrors slower: pirate-get would have to make a search with a fixed term and check that the magnets are present. I didn't know about this api: it could indeed be used here. Anyway, if you need to use another proxy you can set pirate-get to use your own list of mirrors with --mirror option or in the config file.

rnhmjoj commented 7 years ago

I close this since it seems you have solved and I updated the blacklist.