jean-roland / adobe_stock_scraper

A python/selenium scrapper for Adobe Stock Image thumbnails
GNU General Public License v3.0
5 stars 0 forks source link

Scraping color palettes #1

Open moebiussurfing opened 1 year ago

moebiussurfing commented 1 year ago

Hello, I am looking for a guide to scrap color palettes from sites like Adobe and Coolors... Do you think that repo could be useful for inspiration on that? Regards.

jean-roland commented 1 year ago

Hello dear person,

I'm not entirely sure what your goal is, so let me briefly break down how my scraper works and hopefully you'll know if it can be useful to you.

The goal: I wanted to automatically grab thousands of stock images thumbnails from a specific topic on the website Adobe stock, for machine learning purposes.

The issue: the images are loaded dynamically, you need a browser to visit and scroll the result pages to get the images to appear so I couldn't just parse the webpage.

The parts: 1) Selenium is a piece of software that allows you to emulate user interactions in a browser (in this case, scrolling) 2) BeautifulSoup leverages an html/xml parser to scrape elements from a webpage.

In this case, I ask Selenium to go on a page, scroll to make the images load and then use BeautifulSoup to retrieve them.

If what you want to scrape (color palettes in your case) are not statically available and requires user actions to load, which seems to be the case on Coolors, then you definetely need to dig on Selenium.

Now my scraper only uses Selenium to scroll while you might need to use it for clicks, so I'm unsure if it'll be helpful to you as a stepping stone.

Hope you succeed in your endeavour,

Jean-Roland

Le lun. 21 nov. 2022 à 10:47, moebiusSurfing @.***> a écrit :

Hello, I am looking for a guide to scrap color palettes from sites like Adobe and Coolors... Do you think that repo could be useful for inspiration on that? Regards.

— Reply to this email directly, view it on GitHub https://github.com/jean-roland/adobe_stock_scraper/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALL24SVUOPWXZLW6IJEKE53WJNAK7ANCNFSM6AAAAAASGOFRXM . You are receiving this because you are subscribed to this thread.Message ID: @.***>