igorsimb / mp-monitor

Django app for scraping Wildberries
1 stars 0 forks source link

Interval update should keep the schedule (not create new one) #102

Closed igorsimb closed 5 months ago

igorsimb commented 5 months ago

A new view should be created.

igorsimb commented 5 months ago

General logic

The cleanest way to achieve this functionality with minimal changes is to modify the existing create_scrape_interval_task view. Here's how:

  1. Update existing logic:

Instead of creating a new schedule if the form is valid, check if an existing schedule exists for the user:

if request.method == "POST":
    scrape_interval_form = ScrapeIntervalForm(request.POST)

    if scrape_interval_form.is_valid():
        logger.info("Updating the task")
        # Check for existing schedule
        try:
            schedule = IntervalSchedule.objects.get(user=request.user)
        except IntervalSchedule.DoesNotExist:
            # Fallback to creating a new schedule if none exists
            schedule, created = IntervalSchedule.objects.get_or_create(
                every=scrape_interval_form.cleaned_data["interval_value"],
                period=getattr(IntervalSchedule, scrape_interval_form.cleaned_data["period"].upper()),
            )
  1. Update selected items:

Since you already have the skus_list extracted from the form, use it to update the existing schedule's related items. You can achieve this using Django's m2m_changed signal:

# Update selected items
schedule.items.clear()  # Clear existing items
schedule.items.add(*skus_list)
schedule.save()
  1. Handle no item selection:

Keep the existing logic to check if at least one item is selected and redirect if not. Benefits:

This approach avoids creating duplicate schedules with the same interval. It leverages existing logic for form validation and interval handling. It utilizes Django's built-in signal to efficiently update related items. Complete modified view:

def create_scrape_interval_task(
    request: WSGIRequest,
) -> HttpResponse | HttpResponseRedirect:
    """Takes interval from the form data (in seconds) and updates main.tasks.scrape_interval_task

    The task itself prints all items belonging to this tenant every {{ interval }} seconds.
    """

    if request.method == "POST":
        scrape_interval_form = ScrapeIntervalForm(request.POST)

        if scrape_interval_form.is_valid():
            logger.info("Updating the task")
            skus = request.POST.getlist("selected_items")
            skus_list = [int(sku) for sku in skus.split(" ")]

            if not is_at_least_one_item_selected(request, skus):
                return redirect("item_list")

            uncheck_all_boxes(request)

            try:
                schedule = IntervalSchedule.objects.get(user=request.user)
            except IntervalSchedule.DoesNotExist:
                # Fallback to creating a new schedule if none exists
                schedule, created = IntervalSchedule.objects.get_or_create(
                    every=scrape_interval_form.cleaned_data["interval_value"],
                    period=getattr(IntervalSchedule, scrape_interval_form.cleaned_data["period"].upper()),
                )

            schedule.items.clear()  # Clear existing items
            schedule.items.add(*skus_list)
            schedule.save()
            # ... rest of your logic to initiate the task
    # ... rest of your logic for GET request

This modified view efficiently updates the existing schedule with the selected items while keeping the interval unchanged.

igorsimb commented 5 months ago

what if we create an update_existing_interval_task?

  1. existing_task = PeriodicTask.objects.get(name=f"scrape_interval_task_{request.user}")
  2. get the list of selected_items
  3. scrape_interval_task = PeriodicTask.objects.update(...)

The task must keep the same interval and period.

igorsimb commented 5 months ago

Gemini

Here's the new update_existing_interval_task view based on the provided create_scrape_interval_task view:

def update_existing_interval_task(
    request: WSGIRequest,
) -> HttpResponse | HttpResponseRedirect:
    """Takes the new list of selected items and updates the existing scraping task.

    This view updates the existing task identified by the user and the interval, 
    adding or removing items from the scraping list.
    """

    if request.method == "POST":
        scrape_interval_form = ScrapeIntervalForm(request.POST)

        if scrape_interval_form.is_valid():
            logger.info("Updating existing task")
            skus = request.POST.getlist("selected_items")
            new_skus_list = [int(sku) for sku in skus.split(" ")]

            if not is_at_least_one_item_selected(request, skus):
                return redirect("item_list")

            uncheck_all_boxes(request)

            try:
                # Get the existing schedule for the user
                schedule = IntervalSchedule.objects.get(user=request.user)
            except IntervalSchedule.DoesNotExist:
                messages.error(request, "Не найдено расписание для обновления.")
                return redirect("item_list")

            # Check if the list of items actually changed
            if schedule.items.all().values_list("pk", flat=True) == new_skus_list:
                messages.info(request, "Список товаров не изменился.")
                return redirect("item_list")

            # Update the existing schedule's items
            schedule.items.clear()
            schedule.items.add(*new_skus_list)
            schedule.save()

            # Update the existing PeriodicTask with the new item list
            try:
                scrape_interval_task = PeriodicTask.objects.get(
                    name=f"scrape_interval_task_{request.user}"
                )
                scrape_interval_task.args = [request.user.tenant.id, new_skus_list]
                scrape_interval_task.save()
                logger.info(
                    "Existing interval task '%s' updated with new item list",
                    scrape_interval_task.name,
                )

            except PeriodicTask.DoesNotExist:
                messages.error(request, "Не найдена задача для обновления.")
                return redirect("item_list")

            # Update 'scrape_interval_task' session variable (optional)
            # ... Update session variable based on updated scrape_interval_task

            # Update items' "is_parser_active" field
            items = Item.objects.filter(Q(tenant_id=request.user.tenant.id) & Q(sku__in=new_skus_list))
            items_bulk_update_list = []
            for item in items:
                item.is_parser_active = True
                items_bulk_update_list.append(item)
            Item.objects.bulk_update(items_bulk_update_list, ["is_parser_active"])

            return redirect("item_list")

    else:
        messages.error(
            request,
            "Что-то пошло не так. Попробуйте еще раз или обратитесь к администратору.",
        )
        return redirect("item_list")

This view achieves the desired functionality:

Retrieves existing task: It fetches the IntervalSchedule for the user using IntervalSchedule.objects.get(user=request.user). Checks for list change: It compares the existing items with the new list using schedule.items.all().values_list("pk", flat=True) == new_skus_list. If they are the same, it informs the user and exits. Updates existing schedule: It clears the existing items and adds the new ones using schedule.items.clear() and schedule.items.add(). Updates PeriodicTask: It retrieves the existing PeriodicTask by name and updates its args with the new item list. Updates session variable (optional): You can optionally update the session variable based on the updated scrape_interval_task. Updates items' field: Similar to the original view, it updates the is_parser_active field for the selected items. This approach ensures that the existing schedule and interval are preserved while updating the list of items for scraping.