rhadamanthe / host-grabber-pp

A web extension, originally designed for Firefox, to find and download media files from various hosts.
MIT License
16 stars 6 forks source link

Support partial patterns on web pages #16

Closed rhadamanthe closed 6 years ago

rhadamanthe commented 6 years ago

Consider the following URL pattern: https?://(www\.)?toto\.com/media/[^"]+ It sometimes happen pages reference absolute ou relative links on the page. As an example, toto.com may have page that only contain /media/test.jpg or ../../media/test.jpg.

In such cases, HG ++ does not find anything. The search pattern becomes invalid.


There are 3 solutions to this.

  1. Introduce a new property in the dictionary to specify the search pattern is restricted to a given host. Here, the restriction would be about the toto.com host, and the search pattern would become .(*/)?media/^"]+.

  2. Implement a smart guess to deduce relative patterns from a global one. Depending on the search pattern, it might be complicated.

  3. Rework the dictionary. Add a domain property. And rename the URL pattern to path pattern.


With the third option, it becomes much more simple to handle all these cases.

Code modifications are not the most complicated. We will have to rewrite the dictionary.