igorsimb / mp-monitor

Django app for scraping Wildberries
1 stars 0 forks source link

Sleep between scrapes in `scrape_items_from_skus` #164

Open igorsimb opened 3 months ago

igorsimb commented 3 months ago

To avoid IP ban, a pause is needed somewhere between 2-20 seconds (time.sleep(random.randint(2, 20)))

def scrape_items_from_skus(skus: str, is_parser_active: bool = False) -> tuple[list[dict[str, Any]], list[str]]:

...

    for sku in re.split(r"\s+|\n|,(?:\s*)", skus):
        logger.info("Scraping item: %s", sku)
        try:
            item_data = scrape_item(sku)
            items_data.append(item_data)
            if is_parser_active:
                item_data["is_parser_active"] = True
            time.sleep(random.randint(2, 20))
        except InvalidSKUException as e:
            invalid_skus.append(e.sku)
    return items_data, invalid_skus

Problem

It looks very ugly on the frontend. When user clicks "Add" button, no front-facing feedback happens. Everything is staying the way it is, and a loading animation is playing on the browser's tab. Horrible UX. At least user can freely leave the page, or refresh the page - it will still work since scraping is done in the background a as a separate process.

What can we do?

When "Add" button is clicked:

Minimum:

  1. All the contents of the form disappear
  2. The form collapses (leaving only the + sign)
  3. Message is shown " items are being processed. It make take some time depending on the number of items"

Better:

  1. Placeholders appear in the items table (depending on the number of SKUs up to a certain amount)

Best:

  1. Placeholders are slowly replaced by the real items with some fade in animation. Can use hx polling or something like that. Polling with HTMX: https://www.youtube.com/watch?v=N9HEV1a_kd8 Written version: https://www.photondesigner.com/articles/polling-htmx
igorsimb commented 3 months ago

The Minimum can be done by just refreshing the page and displaying the message, right?

igorsimb commented 3 months ago

Some useful info. This is how we can catch the before and after request using htmx.

add_items_form.html

...

<-- <form method="POST" action="{% url 'scrape_item' skus=form.skus.value %}" id="addItemsForm"> -->
                    <form hx-post="{% url 'scrape_item' skus=form.skus.value %}" id="addItemsForm"
                          hx-on-htmx-before-request="alert('Making a request!')"
                          hx-on-htmx-after-request="alert('Done making a request!')">

...

More info: https://htmx.org/attributes/hx-on/ Confirmation with sweetalert2: https://htmx.org/examples/confirm/