igorsimb / mp-monitor

Django app for scraping Wildberries
1 stars 0 forks source link

Implement logic for non-existing SKUs when scraping #20

Closed igorsimb closed 9 months ago

igorsimb commented 1 year ago

Somewhat related to https://github.com/igorsimb/mp-monitor/issues/18

If we try to scrape non-existing SKU, we are getting IndexError: list index out of range and, as a result, none of the intered SKUs will be scraped.

Need to implement a check for each SKU, scrape existing ones, and then show an error like so "The following SKUs do not exist: ". So there should be a way to store the non-existing SKUs to show on screen.

Add tests for:

igorsimb commented 9 months ago

Currently here's the flow:

  1. form:ScrapeForm
  2. view:scrape_items creates a string of SKUs
  3. util:scrape_items_from_skus loops over SKUs sending each SKU to
  4. util:scrape_item that returns a dict with item info back to
  5. util:scrape_items_from_skus that creates a big list of dicts with all the scraped items and returns to
  6. view:scrape_items updates db with info from this big list.

scrape_item_flow

We need an additional step between 3 and 4: 3.5. util:check_for_valid_sku within scrape_item util. If SKU is valid, keep going to 4; if SKU is not valid, return this invalid: {sku}. scrape_items_from_skus should have an invalid_skus list and check if scrape_item returned invalid: {sku}, add it to the list and DO NOT add it to item_data list.

Then we display invalid_skus list to user somehow.

igorsimb commented 9 months ago

There are 2 ways of solving this:

  1. Use regex to check for valid SKU before making the API request. That way we don't have to make API requests at all. Create a function (see below) and call it at the very beginning of scrape_item util as a guard rail.

    def is_sku_valid(sku: str) -> bool:
    """Check if the SKU is valid.
    
    Args:
        sku (str): The SKU to check.
    
    Returns:
        bool: True if the SKU is valid, False otherwise.
    """
    # only numbers between 5-12 characters long
    if re.match(r"^[0-9]{5,12}$", sku):
        return True
    else:
        return False

At the start of scrape_item util:

    if not is_sku_valid:
        logger.error("Invalid SKU: %s", sku)
        return {}
  1. Implement a check for an empty list before this line: item = data.get("data", {}).get("products")[0]

    # non-existing SKU requests return and empty products list
    logger.info("Checking if SKU '%s' is valid...", sku)
    if not data.get("data", {}).get("products"):
        logger.error("SKU '%s' is invalid!", sku)
        return {}
    
    item = data.get("data", {}).get("products")[0]

I think we should do either number 1, or both checks.

igorsimb commented 9 months ago

Both checks need to be implemented:

Next step: how do we handle the incorrect SKUs and messaging to user?