sushil-rgb / AmazonMe

Introducing the AmazonMe webscraper - a powerful tool for extracting data from Amazon.com using the Requests and Beautifulsoup library in Python. This scraper allows users to easily navigate and extract information from Amazon's website.
GNU General Public License v3.0
55 stars 22 forks source link

Suggestion for Enhancing Project Functionality #8

Open pyccino opened 9 months ago

pyccino commented 9 months ago

Hello, I recently came across your project and found it to be quite impressive!

Upon analyzing packets from the Amazon iOS app today, I discovered that it utilizes the "endpoint" to extract valuable information such as:

What's particularly intriguing is that each request allows the submission of up to 100 ASINs, making it seemingly resistant to bans (fingers crossed). The endpoint for this functionality is: "https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&asinList=B0BG8F7PCX&vs=1"

While the app employs a few other parameters, I have yet to find any of them particularly interesting.

To include ASINs, utilize the "asinList" parameter and separate the ASINs with a comma, as demonstrated here: "https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&asinList=B0BG8F7PCX,CB0CG7JG7N3&vs=1"

It's worth noting that the other parameters, apart from "asinList," are not optional, and any alterations to their values result in empty returns (I'm still trying to figure out why).

Although the provided endpoint is for Amazon.it, I believe it could potentially work for other Amazon countries as well.

Here is an example output:

{
    "ASIN": "B0BG8F7PCX",
    "Type": "JSON",
    "sortOfferInfo": "",
    "isPrimeEligible": "false",
    "Value": {
        "content": {
            "twisterSlotJson": {"price": "49.49"},
            "twisterSlotDiv": "<span id=\"_price\" class=\"a-color-secondary twister_swatch_price unified-price\"><span class=\"a-size-mini twisterSwatchPrice\"> 49,49 € </span></span>"
        }
    }
}
&&&
{
    "ASIN": "B0CG7JG7N3",
    "Type": "JSON",
    "sortOfferInfo": "",
    "isPrimeEligible": "false",
    "Value": {
        "content": {
            "twisterSlotJson": {"price": "69.99"},
            "twisterSlotDiv": "<span id=\"_price\" class=\"a-color-secondary twister_swatch_price unified-price\"><span class=\"a-size-mini twisterSwatchPrice\"> 69,99 € </span></span>"
        }
    }
}

If the non-discounted price is present, it will be embedded in the "content" HTML.

sushil-rgb commented 9 months ago

Hey @pyccino. I appreciate your insights. I've been thinking about implementing price alert functionalities, and it looks like the endpoint you found will be beneficial for that. Thanks once again for your valuable feedback and suggestions.

pyccino commented 9 months ago

I propose the addition of a new method, api_scraping, to leverage an API endpoint for scraping ASIN and price information. This solution aims to enhance the efficiency and reliability of retrieving data compared to other methods.

Code Proposal:

async def api_scraping(self, asin_list):
    api_url = self.config["Ascraper"]['api_url']

    # Create a list of dictionaries for each ASIN and price extracted from the API, using JSON for faster extraction
    for asin in asin_list:
        api_url = api_url + str(asin)

    for retry in range(self.max_retries):
        try:
            # Use the 'static_connection' method to download the HTML content of the search results page
            content = await Response(api_url).content()
            content = content.split(b'&&&')

            # Convert the content to JSON
            content = [json.loads(c) for c in content]

            # Extract the price from the JSON
            price = [c['Value']['content']['twisterSlotJson']['price'] for c in content]
            asin = [c['ASIN'] for c in content]
            isPrimeEligible = [c['isPrimeEligible'] for c in content]   

            # Create a list of dictionaries with ASIN, price, and isPrimeEligible
            asin_price = [{'ASIN': asin[i], 'Price': price[i], 'isPrimeEligible': isPrimeEligible[i]} for i in range(len(asin))]

            return asin_price
        except ConnectionResetError as se:
            print(f"Connection lost: {str(se)}. Retrying... ({retry + 1} / {self.max_retries})")
            if retry < self.max_retries - 1:
                await asyncio.sleep(5)

Configuration File:

Ascraper:
    max_retries: 3
    api_url: https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&vs=1&asinList=

Implementation Notes:

Note: While this code may not fully align with the intended use of the API, it provides a simple starting point that can be expanded upon and customized based on future requirements. Further enhancements and adjustments can be made to better align with the desired functionality.

I welcome your feedback on this proposal and am available for further discussions or clarifications.

Thank you.