jasonaibrahim / scraper

thumbnail scraper
https://www.npmjs.org/package/scraper-js
MIT License
9 stars 3 forks source link

Does this package still work #4

Closed Atharane closed 1 year ago

Atharane commented 1 year ago

Hi, Jason Atharva here, I am trying to build a getpocket.com clone, the issue I am facing here is I want to display the thumbnail of the blog along with the blog title once the URL is saved, but I am unable to get the URL to the thumbnail of the blog. Can your package do that for me? Also can you please guide me to tackle this issue as I am new to web development?

jasonaibrahim commented 1 year ago

Hi @Atharane

I have not made any updates to this package in quite a long time. I'm glad there is still some interest. I would be happy to give you guidance and see you implement the rest as a contributor.

Roughly speaking, this library should do the following:

Given a url, a. visit the url b. parse the page content using a library such as e.g. cheerio c. search for, rank, and sort images based on their ranking

Let's drill into the last point a bit more. To find images on a page, we have a few different options available to us:

There could be other approaches I don't know about or failed to mention. Please do your own research if you're interested.

Now for ranking.. this is a bit subjective, but I would rank as follows:

  1. rank LinkedData images with the highest weight
  2. rank OpenGraph images with the second highest weight
  3. rank TwitterCard images with the third highest weight
  4. rank all other images by a combination of precedence and resolution - i.e. images that show up earlier in the document and meet a minimum size threshold should be given a higher weight than images that are exceedingly small (e.g. favicons) or appear further down in the document tree (e.g. avatars or other content).

I will revitalize this library since it hasn't been touched in some time; I'll update to typescript and provide a test harness with some empty tests. I'll leave the rest up to you to implement. What do you think?

Atharane commented 1 year ago

Hey @jasonaibrahim, thanks for taking the time to read about my issue. Currently, my project is in the design & analysis phase, but I'll build a rough prototype in a few days. I'll get back to you once that's done to solve any issues or suggest improvements. Is that okay? Have a great weekend!

jasonaibrahim commented 1 year ago

check out the new 2.0.0 version