rhadamanthe / host-grabber-pp

A web extension, originally designed for Firefox, to find and download media files from various hosts.
MIT License
16 stars 6 forks source link

new link extraction strategy: cssquery #71

Closed ghost closed 4 years ago

ghost commented 4 years ago

For some hosts the easiest way to extract the image link is by a css query.

The cssquery strategy could be configured with a css query string. The css query would return the img element. Then the src attribute can be used as download link.

css queries are very versatile.

For example this code snippet used in Image Host Grabber Classic extracts the image from the new imagevenue page format:

img = doc.querySelector("div.col-md-12 img");
if (img) {
    return {url: img.src, filename: img.alt, status: "OK"};
}

Example (family safe and creative commons license): http://imagevenue.com/MENWU00

rhadamanthe commented 4 years ago

The next version will be released soon, with some bug fixes.
I could add this strategy before the release. :)

ghost commented 4 years ago

The next version will be released soon, with some bug fixes. I could add this strategy before the release. :)

very cool indeed! But if it delays the next release, put it the following one.

Thank you very much for your excellent work!

rhadamanthe commented 4 years ago

Done. I still have to confirm it works with Imagevenue.

ghost commented 4 years ago

I just did a quick test on a forum page with two imagevenue images. This works

<host id="x">
  <domain>
  imagevenue.com
  </domain>
  <path-pattern>
  <![CDATA[[A-Z]+]]>
  </path-pattern>
  <search-pattern>
  <![CDATA[CSS query: div.col-md-12 img]]>
  </search-pattern>
</host>

I am not sure, how stable this imagevenue css query will work. It might be sensitive to layout changes. But generally I prefer them over xpath or regexes.

Thank you very much!

rhadamanthe commented 4 years ago

I have added an entry for imagevenue in my dictionary. I also confirmed that the old one was still working. So, HG++ will support both strategies, the filter being the path pattern.