sushil-rgb / AmazonMe

Introducing the AmazonMe webscraper - a powerful tool for extracting data from Amazon.com using the Requests and Beautifulsoup library in Python. This scraper allows users to easily navigate and extract information from Amazon's website.
GNU General Public License v3.0
51 stars 22 forks source link

Fetch offers in categories #7

Open EliaTolin opened 7 months ago

EliaTolin commented 7 months ago

Hi, Is possible fetch offers in the categories? For example fetch the "Elettronics" (category) offers of the day.

sushil-rgb commented 7 months ago

Can you paste the link here, I will look into it and will update you

EliaTolin commented 7 months ago

For example using the node:

https://www.amazon.it/b?node=2454160031

For example search with categories keywords: https://www.amazon.it/s?k=offerte+elettronica

I created this web server PAAPI Amazon Webserver but now I need the scraper.

I suggest you to make this more dynamic, for example two new method:

1) get_category_offers(str category name)

2) search_products(list[str] keywords)

Without using a url

sushil-rgb commented 7 months ago

Noted. I will look into this and will update you.

EliaTolin commented 7 months ago

Noted. I will look into this and will update you.

Thanks you, i wait you

sushil-rgb commented 7 months ago

hey @EliaTolin, I hope you're doing well. I've created a new .py file called offers.py where I've implemented a function called get_category_offer. This function is designed to scrape offers based on the category name. Here's the snippet::

async def get_category_offer(category_name):
    categ_url = f"https://www.amazon.it/s?k=offerte+{category_name}"
    offers = await Amazon(categ_url, None).export_csv()
    return offers

call and run the method using the below command:

from scrapers.offers import get_category_offer
import asyncio

async def main():
    return await get_category_offer('elettronica')

if __name__ == '__main__':
    print(asyncio.run(main()))

can you explain more about the method search_products(list[str] keywords)

EliaTolin commented 7 months ago

Thank you very much. After I try it if there is anything I will PR you.

Let me give you a piece of advice.

If you integrate your scraper into another software you are forced to make changes currently because you cannot know the URL based on the country you are operating in.

As you see in the search keywords there is the keyword "offers" which is in Italian. You have to figure out how to be at most dynamic, that is, independent of the country in which you search for offers and independent of the keywords you enter.

For example you can reason about the Node of categories.

I don't have many ideas but I think it is necessary.

Other advice, currently it is not a library but a series of scraping "examples". It would be nice to make it a library.

Example of use:

amazon_me = AmazonMe(country: CountryAmazon.IT, proxy=True)
#Fetch offers of category
offers_electronics = []
offers_electronics = amazon_me.get_category_offers(CategoryAmazon.electronics)
#Fetch information from asin list
products_asin_list = ['B00PQY7TJA', 'B00PQY7SJA', 'B00PQY7343', '343EHFEJHFE', ........]
products_list = amazon_me.get_product_from_asin(products_asin_list)

I give you this examples for improve your scraper. This could be better Amazon's scraper on Github.

sushil-rgb commented 7 months ago

Thank you for the feedback. The domain name is dynamic for the main scraper, i will try to make it dynamic by using regex pattern for the offer category. I will look into this more later. Thank you for the feedback, i appreciate it.

EliaTolin commented 7 months ago

Thank you for the feedback. The domain name is dynamic for the main scraper, i will try to make it dynamic by using regex pattern for the offer category. I will look into this more later. Thank you for the feedback, i appreciate it.

Thanks, I suggest you to focus for make this scraper into a library.