rdemarqui / perfume_recommender

A perfume recommendation system
https://huggingface.co/spaces/rdemarqui/perfume_recommender
20 stars 2 forks source link

Fragrantica Scrapping #2

Open Kageyoshi7777 opened 6 months ago

Kageyoshi7777 commented 6 months ago

Hi, how did you managed to scrap data from fragnantica with all those limits? I was trying different approaches, but without good results. I'd like to scrap whole database, so around +90 perfumes

rdemarqui commented 6 months ago

Hi. Have you tried reboot selenium after each n pages scrapped? Example:

        # Save file each 70 requests
        if index%70 == 0 and index!=0:
            print(f"Sample: {index}")
            temp_dataframe.to_csv(temp_name, index=False)
            driver.quit()
            driver = webdriver.Chrome(options=options)

Here full code used in other project.

Generaly it works.

ktamuly commented 1 month ago

Cool project! Is it legal to scrape Fragrantica though? I'm not entirely sure too. But here's a discussion on the topic: https://www.fragrantica.com/board/viewtopic.php?id=185064