sushil-rgb / AmazonMe

Introducing AmazonMe, a Python-based web scraper designed to extract data from amazon.com using the requests and beautifulSoup libraries. It simplifies navigation and makes it easy to gather information from Amazon’s website efficiently.
GNU General Public License v3.0
56 stars 22 forks source link

Request new functions and clarification of doubts #15

Open Alex87a opened 7 months ago

Alex87a commented 7 months ago

Good evening! I came across this repository of yours as I was interested in the topic of Amazon Scraping. I haven't had the chance to try and test what your program is capable of doing yet, so I apologize if I ask you obvious questions. I read a bit about the description of what this project could potentially do. At this point, however, a doubt arises regarding the implementation of Mongodb. From what I understand, this is nothing more than a sort of database in which the data scraped by Amazon is stored. The question at this point is, once the information has been extracted from Amazon, does the actual scraping take place in the database or does it continue to do so on the official website? Because I would like to understand if the IP address could be banned (even if you have implemented the user-agent). Next, I wanted to ask you if you have ever considered the possibility of implementing Telegram API to build a Bot, through which scrap offers can be posted on a channel or in private. Maybe it's time consuming and laborious to implement, but I just wanted to know if you've ever considered this as an idea. Thank you in advance and wish you a good evening!

sushil-rgb commented 7 months ago

Hey @Alex87a , sorry for responding late and thank you for reaching. As to answer your question:

The question at this point is, once the information has been extracted from Amazon, does the actual scraping take place in the database or does it continue to do so on the official website?

The script stores the datasets in MongoDB after performing live scraping and before storing in the database.

Because I would like to understand if the IP address could be banned (even if you have implemented the user-agent).

It's possible for your IP to be banned, so I have implemented a random time interval between each request to reduce the chance of this happening. So far, my IP has not been banned by Amazon, but I do occasionally receive a 503 error from their server.

Next, I wanted to ask you if you have ever considered the possibility of implementing Telegram API to build a Bot, through which scrap offers can be posted on a channel or in private

I have been thinking about creating a bot that can make a call from the webhook and download the scraped data in a spreadsheet format. However, I am also considering using Discord API, as I have already created a Discord bot that fetches product information from an Amazon product . Unfortunately, I haven't had the time to create a fully-fledged bot app yet.

I would like you to try the scraper and run it for yourself. It's not perfect, but it will do the work accurately.