Closed pt2302 closed 2 months ago
@pt2302 I think the flow may not be working as expected. Can you please share some cases to try for?
Here's the simple case:
This is our normal OCW-Studio flow.
Output:
Processing website: ibrahims-cat-course
No file found at https://drive.google.com/uc?id=xyz&export=download for resource courses/ibrahims-cat-course/cat9.jpeg. Deleting DriveFile and continuing.
Downloading file courses/ibrahims-cat-course/cat9.jpeg from S3 bucket ol-ocw-studio-app.
courses/ibrahims-cat-course/cat9.jpeg uploaded to Google Drive folder.
No file found at https://drive.google.com/uc?id=xyz_&export=download for resource courses/ibrahims-cat-course/meow6.jpeg. Deleting DriveFile and continuing.
Downloading file courses/ibrahims-cat-course/meow6.jpeg from S3 bucket ol-ocw-studio-app.
courses/ibrahims-cat-course/meow6.jpeg uploaded to Google Drive folder.
It seems to always fail to find files in Gdrive, which do exist, and their URL is also correct. When it fails to find the resource files in Gdrive, it deletes the respective DriveFile (and creates another), and we have this signal triggered:
@receiver(pre_delete, sender=DriveFile)
def delete_from_s3(sender, **kwargs): # pylint:disable=unused-argument # noqa: ARG001
"""
Delete the drive file from S3
"""
drive_file = kwargs["instance"]
delete_s3_objects.delay(drive_file.s3_key)
And then our files are deleted from s3, which were not problematic in the first place. At this point, in Gdrive, you now have duplicates of those files. So there are 4 files in total, while our Minio is empty.
These are the lines of code that are doing this.
I tried this using my Gdrive Credentials, and also RC's Gdrive credentials. Got same result.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/4054.
Description (What does it do?)
This PR updates the Google Drive backfill command to be able to handle courses with non-empty Google Drive folders. It checks whether a DriveFile exists for a given resource and, if so, whether the download link is valid. If the DriveFile exists but has no valid download link, the command deletes the old DriveFile, creates a new DriveFile, and uploads the file to Google Drive. If no DriveFile exists, it simply creates the new DriveFile and uploads the file to Google Drive (as before).
How can this be tested?
The following pre-requisites should be set up, including the relevant
.env
variables:https://ocw.mit.edu/courses/<course name>/download/
and download the course ZIP.docker compose up
.static_resources
subfolder tool-ocw-studio-app/courses/<course name>
on Minio (navigate tohttp://localhost:9001
and then use the Minio UI to get there).docker compose exec web ./manage.py backfill_gdrive_folder --filter <course name or short-id>
.