Closed antonsoroko closed 9 months ago
I wasn't able to replicate what you are reporting, however skipping rows must be done at the "rows" xpath and not afterwards. Its the "rows" xpath who decides how many entries are returned.
Since python's ElementTree is very limited, one way to do what you want is:
python3 provider_test.py xpath --rows ".//table/tr/td[2]/a[@href]/../.." "td[2]/a/@href" "https://thepiratebay10.org/search/Video%202023/1/99/200"
i am not that good with xpath, so i have not even thought that it is possible to make specific qualifier and then use ../../
:-)
but it works, thanks!
i have not wanted to use some real video name thus i used a placeholder "video 2023". you would have seen the fail if you would have used any "hot" name - then there will be several pages, thus there will be footer with pages numbers, example:
page can have rows without a way to identify if row has all needed info. for example, page below has last row as footer, thus it does not have necessary info, but at the same time
<tr>
tags do not have any good attributes for filtering.$ python3 provider_test.py xpath --rows ".//table/tr" "td[2]/a/@href" "https://thepiratebay10.org/search/Video%202023/1/99/200"
it would make sense to simply skip such rows.
providers.json
``` [ { "name": "thepiratebay", "base_url": "https://thepiratebay10.org", "results_parser": { "url": "/search/{query:q}/1/99/200", "rows": ".//table/tr", "data": { "magnet": "td[2]/a/@href", "title": "td[2]/div/a/text()", "seeds": "td[3]/text()", "leeches": "td[4]/text()", "size": "td[2]/font/text()" } }, "keywords": { "movie": "{title} {year}", "show": "{title}", "season": "{title} S{season:02}", "episode": "{title} S{season:02}E{episode:02}" }, "attributes": { "color": "FFF14E13", "icon": "provider_icons/thepiratebay.png" } } ] ```