karan / TPB

:cloud: Python API for ThePirateBay.
MIT License
331 stars 66 forks source link

.multipage() is looping #77

Open gl00ten opened 9 years ago

gl00ten commented 9 years ago

The multipage() function is looping from the first page

I couldn't really understand how you stop the pagination https://github.com/karan/TPB/blob/master/tpb/tpb.py#L151

gl00ten commented 9 years ago

figured somethihng else:

.page(n)

where n is bigger than the max existing page, will return page(0). That's the problem, maybe?

umazalakain commented 9 years ago

I couldn't figure out where the problem could be from what you said. What do you mean by "is looping from the first page"? That page "0" is not checked?

When in multipage mode, items() returns the results for a page and then it increments the page integer by one in the URL. It keeps doing that until the returning response has no item in it.

Could you somehow debug the page numbers and the results returned for every page number?

gl00ten commented 9 years ago
from tpb import TPB
from tpb import CATEGORIES, ORDERS

t = TPB('https://thepiratebay.org') # create a TPB object with default domain

searchterms=['bananas']

for searchterm in searchterms:
    torrents = t.search(searchterm).order(ORDERS.SEEDERS.DES).multipage()
    for torrent in torrents:
        print(torrent.title,torrent.seeders)

Try that. I think you'll see it. There should be 6 pages. But I think, because page(x>5) returns page(0) it loops on.

umazalakain commented 9 years ago

OK, so I discovered where the nasty behavior is coming from...

ThePirateBay changed so that URLs pointing to pages outside the result range return the first page, don't ask me why, I don't see how this can be useful. So this two URLs return the same document:

https://thepiratebay.se/search/bananas/0/7/0 https://thepiratebay.se/search/bananas/999/7/0

The API implementation would need to be changed so that the page list is parsed from TPB's pagination. Any implementation that does this is very welcome :-)

Unfortunately I don't have time for working actively on this project but I hope that I'll be able to anywhere near the future.

gl00ten commented 9 years ago

https://github.com/fullmooninu/TPB/blob/master/tpb/tpb.py

What i have there won't work because the "is_final_page" won't be read more than once by the subclass (because of the yield?).

And also because the find('next.gif') I used does not seem to be working.

Still, I hope it makes it an easy job for you?

umazalakain commented 9 years ago

I didn't thought about searching for the next link, good idea!

gl00ten commented 9 years ago

Yep. Now to make it work XD