Closed HussainTaj-arbisoft closed 1 month ago
@gumaerc can you confirm if we really need to add a separate queue. Because assigning rate_limits and low priority to tasks should be enough
@umar8hassan I think that's a sensible approach, and at the end of the day if it works then it works. My thought process behind potentially having a separate queue for this task is that since there are a large amount of links that would have to be checked, the queue would get clogged with tasks whenever this is running. Even if the tasks don't have high CPU usage, the large amount of them could hold up more important tasks. If we can adjust priority to make sure that tasks such as course publishing have absolute priority and kick everything else out of the way, then I think that would be fine. Really the test will be running this in RC and making sure we can still publish courses unimpeded while link processing is happening. If you're confident that rate limits and priority can guarantee that, then go for it.
Tests are failing; it looks like it's because the constants are now in a different file.
Tests are failing; it looks like it's because the constants are now in a different file.
Sorry about that. I had fixed that for api_test.py
My change might not have been staged properly. I'll check and push again.
As suggested by @gumaerc, I think it would be helpful to add a feature flag to disable the celery task if necessary, since this is a lower-priority task than publishing, syncing, the video workflow, etc.
The actual Django objects for the external resources don't appear to be getting updated for me. To reproduce, take a particular external resource and change the external_url
field in Django admin to a URL that doesn't exist. Then, run the check as in this PR's testing instructions. The is_broken
field does not appear to have been updated.
The actual Django objects for the external resources don't appear to be getting updated for me. To reproduce, take a particular external resource and change the
external_url
field in Django admin to a URL that doesn't exist. Then, run the check as in this PR's testing instructions. Theis_broken
field does not appear to have been updated.
I have tested and it seems to be working fine for me. Here is the attached video of my test.
https://github.com/mitodl/ocw-studio/assets/71461724/eda992a2-93fd-4434-b9f3-131d5e0b8a0c
However, there are cases where it won't update the field:
external_url
field to google.com
without scheme/protocol
or any invalid url
. It'd have resulted in an error raised by the requests module, resulting to last check failed error
. Thus retaining the last state of is_broken
field.is_broken
field has been set to true
for a broken resource. manually updating the external_url
field to a valid url will not automatically update the is_broken
to false
or ""
.backup_url
will also not update the is_broken
field.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/3922
Description (What does it do?)
This PR implements the plan described in https://github.com/mitodl/hq/issues/3922.
What still needs to be done (this was my plan, you may alter it as needed):
external_resources
app.How can this be tested?
docker-compose run web ./manage.py migrate
check-broken-external-urls
. Task is set to repeat every week. Frequency can be modified by changingCHECK_EXTERNAL_RESOURCE_STATUS_FREQUENCY
in main/settings.pyAdditional Testing
To manually test the functionality locally with a batch of urls
docker-compose exec web ./manage.py shell
from external_resources.tasks import check_external_resources
from websites.models import WebsiteContent
from websites.constants import CONTENT_TYPE_EXTERNAL_RESOURCE
external_resources = list(WebsiteContent.objects.filter(type=CONTENT_TYPE_EXTERNAL_RESOURCE).values_list("id", flat=True))
len(external_resources)
andexternal_resources = external_resources[:1]
check_external_resources(external_resources)
resources = WebsiteContent.objects.filter(id__in=external_resources).select_related("external_resource_state")
resource = next(resources.iterator())
str(resource.external_resource_state.last_checked)
To manually execute task in celery
docker-compose exec web ./manage.py shell
from external_resources.tasks import check_external_resources_for_breakages
results = check_external_resources_for_breakages.apply()
** This might produce large number of urls to check and may take some time to complete.