openzim / zimfarm

Farm operated by bots to grow and harvest new zim files
https://farm.openzim.org
GNU General Public License v3.0
84 stars 25 forks source link

Schedule column always empty under certain conditions #720

Closed kelson42 closed 1 year ago

kelson42 commented 2 years ago

Recipe view looks good with pagination at 10 A123B034-FB26-4D87-B05A-332077BA2A8A

But with pagination at 50, the schedule column is empty 51A8F65F-7DF9-4496-9405-40CD13856C23

rgaudin commented 2 years ago

Good catch! It's due to incorrect handling of the limit (your filter has less entries than the limit)

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

benoit74 commented 1 year ago

Isn't this already fixed? See https://farm.openzim.org/recipes?category=ifixit for instance.

rgaudin commented 1 year ago

No it's not. @benoit74 we're looking at the column with a clock that contains a tick if the schedule is requested (has a requested-task).

ATM, there are only wikihow recipes in the pipe

Screenshot 2023-09-29 at 08 06 51

In the recipes list, with a pagination of 20, I do get the ticks

Screenshot 2023-09-29 at 08 07 04

But if I change to 50, the ticks are gone

Screenshot_2023-09-29_at_08_07_12
benoit74 commented 1 year ago

This is linked to an underlying HTTP error at UWSGI level. By default UWSGI limit the request size to 4096 bytes.

When there is a lot of schedules, the UI does a pretty big query to retrieve all requested tasks due to the addition of all schedule names: https://api.farm.openzim.org/v1/requested-tasks/?limit=50&schedule_name=wikihow_ar_maxi&schedule_name=wikihow_cs_maxi&schedule_name=wikihow_de_maxi&schedule_name=wikihow_en_endless&schedule_name=wikihow_en_endless_arts-and-entertainment&schedule_name=wikihow_en_endless_cars-and-other-vehicles&schedule_name=wikihow_en_endless_computers-and-electronics&schedule_name=wikihow_en_endless_education-and-communications&schedule_name=wikihow_en_endless_food-and-entertaining&schedule_name=wikihow_en_endless_hobbies-and-crafts&schedule_name=wikihow_en_endless_holidays-and-traditions&schedule_name=wikihow_en_endless_home-and-garden&schedule_name=wikihow_en_endless_personal-care-and-style&schedule_name=wikihow_en_endless_sports-and-fitness&schedule_name=wikihow_en_endless_work-world&schedule_name=wikihow_en_endless_youth&schedule_name=wikihow_en_maxi&schedule_name=wikihow_es_maxi&schedule_name=wikihow_fa_maxi&schedule_name=wikihow_fr_maxi&schedule_name=wikihow_hi_maxi&schedule_name=wikihow_id_maxi&schedule_name=wikihow_it_maxi&schedule_name=wikihow_ja_maxi&schedule_name=wikihow_ko_maxi&schedule_name=wikihow_nl_maxi&schedule_name=wikihow_pt_maxi&schedule_name=wikihow_ru_maxi&schedule_name=wikihow_th_maxi&schedule_name=wikihow_tr_maxi&schedule_name=wikihow_vi_maxi&schedule_name=wikihow_zh_maxi

I see three options:

  1. adapt the /schedules endpoint to also return the info about the presence of a requested_task: a boolean property task_requested
  2. in the UI, get all requested-tasks to populate this view and filter in the UI ; use a default pagination of 100 items and a loop to fetch all requested-tasks ; do not keep the list of requested tasks in memory, only the info which interests us: is there a requested task for the given schedule ; the list of requested-tasks is expected to always be small + filtering them on the fly should be ok
  3. adapt UWSGI configuration to increase maximum request size

I prefer option 1 because it is the more straightforward solution ; performance impact is probably minimal because when have an index at hand + the number of requested tasks is expected to be small ; the endpoint is then maybe a bit less "RESTfull" but who cares?

I don't like option 3 because this is the kind of setting which has been chosen for good reasons, and changing it just for a "not-adapted" HTTP endpoint is not a good practice IMHO.

benoit74 commented 1 year ago

If we choose option 1, we should probably adapt GET /schedules and GET /schedules/{schedule_name} for consistency

rgaudin commented 1 year ago

I agree changing it in uwsgi would not be wise. 4KiB is a lot and browsers may strip it anyway.

I am also in favor of 1. is_requested: bool is a schedule-related info and data model should not dictate the API. It's semantically different from requested_task_id for instance.