jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
https://github.com/metainspector/metainspector
MIT License
1.03k stars 165 forks source link

Improve detection of non standard images on modern web pages #269

Open ishields opened 4 years ago

ishields commented 4 years ago

Currently the image detection mostly pays attention to the src attribute on an img tag. Modern webpages often use other means to define images srcset, data-img, data-bg, etc. This ticket is to expand on the image detection to try and fetch images defined in more ways so we can return the best images possible.

Note: A very raw attempt at doing this is in progress and I will attach a PR but will need feedback.

jschwindt commented 4 years ago

Hi @ishields! I agree on improving the image detection but I think we should use only standard attributes like srcset. Regarding the data-* I think it's almost impossible to be aware of all the user defined attributes that anyone can use.

dkam commented 3 years ago

Thanks for this gem. I think that <picture> should be supported too. I'd be happy to submit a pull request if there's interest.

jaimeiniesta commented 3 years ago

@dkam sure that sounds great!