Open gl00ten opened 10 years ago
figured somethihng else:
.page(n)
where n is bigger than the max existing page, will return page(0). That's the problem, maybe?
I couldn't figure out where the problem could be from what you said. What do you mean by "is looping from the first page"? That page "0" is not checked?
When in multipage mode, items()
returns the results for a page and then it increments the page integer by one in the URL. It keeps doing that until the returning response has no item in it.
Could you somehow debug the page numbers and the results returned for every page number?
from tpb import TPB
from tpb import CATEGORIES, ORDERS
t = TPB('https://thepiratebay.org') # create a TPB object with default domain
searchterms=['bananas']
for searchterm in searchterms:
torrents = t.search(searchterm).order(ORDERS.SEEDERS.DES).multipage()
for torrent in torrents:
print(torrent.title,torrent.seeders)
Try that. I think you'll see it. There should be 6 pages. But I think, because page(x>5) returns page(0) it loops on.
OK, so I discovered where the nasty behavior is coming from...
ThePirateBay changed so that URLs pointing to pages outside the result range return the first page, don't ask me why, I don't see how this can be useful. So this two URLs return the same document:
https://thepiratebay.se/search/bananas/0/7/0 https://thepiratebay.se/search/bananas/999/7/0
The API implementation would need to be changed so that the page list is parsed from TPB's pagination. Any implementation that does this is very welcome :-)
Unfortunately I don't have time for working actively on this project but I hope that I'll be able to anywhere near the future.
https://github.com/fullmooninu/TPB/blob/master/tpb/tpb.py
What i have there won't work because the "is_final_page" won't be read more than once by the subclass (because of the yield?).
And also because the find('next.gif') I used does not seem to be working.
Still, I hope it makes it an easy job for you?
I didn't thought about searching for the next link, good idea!
Yep. Now to make it work XD
The multipage() function is looping from the first page
I couldn't really understand how you stop the pagination https://github.com/karan/TPB/blob/master/tpb/tpb.py#L151