feature request - Githubissues

sushil-rgb / AmazonMe

Introducing the AmazonMe webscraper - a powerful tool for extracting data from Amazon.com using the Requests and Beautifulsoup library in Python. This scraper allows users to easily navigate and extract information from Amazon's website.

GNU General Public License v3.0

51 stars 22 forks source link

feature request #1

Open Svalbard92 opened 1 year ago

Svalbard92 commented 1 year ago

Hi, The code works flawlessly. Can you please modify it to get the results from https://www.amazon.com/gp/goldbox too?

Thanks.

sushil-rgb commented 1 year ago

Svalbard92

Thank you, I appreciate it. I will look into this, please wait for the update.

Svalbard92 commented 1 year ago

Thanks for your prompt response. after a closer look, it was seen that some items of that page are nested inside another and that may be the reason why it is returning null data.

one more feature request:

Currently only one image of the product is being fetched and stored. and also product description (about the item section, under id feature-bullets) is not being fetched/stored.

Can you modify the code to store### all the images (links) available and the features (texts of about the item) of the item?

Thanks.

sushil-rgb commented 1 year ago

Thanks for your prompt response. after a closer look, it was seen that some items of that page are nested inside another and that may be the reason why it is returning null data.

one more feature request:

Currently only one image of the product is being fetched and stored. and also product description (about the item section, under id feature-bullets) is not being fetched/stored.

Can you modify the code to store### all the images (links) available and the features (texts of about the item) of the item?

Thanks.

Certainly, I will look into this and will update the code as soon as possible.

Svalbard92 commented 1 year ago

Hi,

Any update on the same?

sushil-rgb commented 1 year ago

Hi,

Any update on the same?

I haven't had a chance to look further, I am stuck on pagination, the goldbox page is JS rendered so using playwright, you can see the new addition Goldbox method in my scraper class, however the script only extracts the next page's url. I will finish the script by next week (estimation)

Svalbard92 commented 1 year ago

I have seen that you have made some changes and committed to master branch, so i was not quite sure if the modifications worked for you or not as it was retuning null data for me.

Thanks for the update.

sushil-rgb commented 1 year ago

Hello @Svalbard92, I have made some updates to the script. The new method, called concurrent_scraping_gb, is responsible for scraping Amazon deals. However, there are several errors occurring. One of the issues is that the script is skipping certain URLs during the scraping process. This problem arises because some of the URLs directly lead to a product within the deal URLs. I still need to investigate this issue. I would appreciate it if you could try running the script and see the results for yourself. The current method successfully scrapes various product information, including product breakdown, description, saved deals, and a list of images, and stores them in an Excel database. Please keep in mind that these fields only work for goldbox URLs. Cheers!!

Svalbard92 commented 1 year ago

Thanks for the update. while running the code, i am getting raise ValueError("No objects to concatenate") ValueError: No objects to concatenate in scaper.py after

Crawling page | 436.
Content loading error beyond this page. Error message | Element is not attached to the DOM
=========================== logs ===========================
attempting click action
  waiting for element to be visible, enabled and stable
============================================================.
The extraction process has begun and is currently in progress. 
The web scraper is scanning through all the links and collecting relevant information. 
Please be patient while the data is being gathered.

Svalbard92 commented 1 year ago

Hi @sushil-rgb , could you manage time to look into the issue?

sushil-rgb commented 1 year ago

Hi @sushil-rgb , could you manage time to look into the issue?

Hey @Svalbard92. I will look into this today and will update the codebase as soon as possible.

Svalbard92 commented 10 months ago

Hi, I have tried the code today. Unfortunately, it is giving me error.