i96751414 / script.flix.magneto

Provider for Flix, using Torrest
4 stars 1 forks source link

Skip row if parsing failed #12

Closed antonsoroko closed 9 months ago

antonsoroko commented 9 months ago

page can have rows without a way to identify if row has all needed info. for example, page below has last row as footer, thus it does not have necessary info, but at the same time <tr> tags do not have any good attributes for filtering.

$ python3 provider_test.py xpath --rows ".//table/tr" "td[2]/a/@href" "https://thepiratebay10.org/search/Video%202023/1/99/200"

Traceback (most recent call last):
File "/home/user/.kodi/addons/script.flix.magneto/provider_test.py", line 282, in <module>
main()
File "/home/user/.kodi/addons/script.flix.magneto/provider_test.py", line 278, in main
args.func(args)
File "/home/user/.kodi/addons/script.flix.magneto/provider_test.py", line 99, in xpath
logging.info(parser._xpath_element(element, args.xpath))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.kodi/addons/script.flix.magneto/lib/parsers.py", line 54, in _xpath_element
return self._xpath_find(element, attr_match.group(1)).attrib[attr_match.group(2)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'attrib'

it would make sense to simply skip such rows.

providers.json ``` [ { "name": "thepiratebay", "base_url": "https://thepiratebay10.org", "results_parser": { "url": "/search/{query:q}/1/99/200", "rows": ".//table/tr", "data": { "magnet": "td[2]/a/@href", "title": "td[2]/div/a/text()", "seeds": "td[3]/text()", "leeches": "td[4]/text()", "size": "td[2]/font/text()" } }, "keywords": { "movie": "{title} {year}", "show": "{title}", "season": "{title} S{season:02}", "episode": "{title} S{season:02}E{episode:02}" }, "attributes": { "color": "FFF14E13", "icon": "provider_icons/thepiratebay.png" } } ] ```
i96751414 commented 9 months ago

I wasn't able to replicate what you are reporting, however skipping rows must be done at the "rows" xpath and not afterwards. Its the "rows" xpath who decides how many entries are returned.

Since python's ElementTree is very limited, one way to do what you want is: python3 provider_test.py xpath --rows ".//table/tr/td[2]/a[@href]/../.." "td[2]/a/@href" "https://thepiratebay10.org/search/Video%202023/1/99/200"

antonsoroko commented 9 months ago

i am not that good with xpath, so i have not even thought that it is possible to make specific qualifier and then use ../../ :-)

but it works, thanks!


i have not wanted to use some real video name thus i used a placeholder "video 2023". you would have seen the fail if you would have used any "hot" name - then there will be several pages, thus there will be footer with pages numbers, example: image