Open nucleogenesis opened 2 years ago
Note that I think the most sustainable way to do this would be to use a DjangoStorage class to handle any file uploads in Kolibri - then it can be swapped out for a different class that supports the appropriate backend for the environment.
This is similar to https://github.com/learningequality/kolibri/issues/5698, except that this is for all non-content file operations - we have worked around content in remote settings by not having to import content at all, which seems better!
@rtibbles some thoughts & questions on this
Looks like we'll need to set up a BCK env so that we can authenticate to it w/ the google-cloud-storage
lib.
I found this gcloud backend in a lib called django-storages
(which is BSD-3-Clause fwiw in case we want to try to vendor the single backend to avoid WHL bloat?)
If I'm reading the DJango docs correctly and understanding well, the short list of things to do here are:
gcloud
utility locally, but I imagine there is some automagical parts of this when connecting from one part of GCP to another?)DEFAULT_FILE_STORAGE
to that django-storages.storages.backends.gcloud
module.Are there any other things I should be considering here w/ regard to how Kolibri works on BCK (cc @DXCanas @anguyen1234 ).
The main work is updating how we interact with files to use a DjangoStorage backend, currently we just deal with files on disk for the generated reports.
We don't need to add the google cloud backend to Kolibri's dependencies (I imagine that will cause a lot of bloat), so instead, we just need to make the default storage backend configurable. We can check that the right things are installed in the same way that we verify our Redis configuration too by trying to import the appropriate package: https://github.com/learningequality/kolibri/blob/develop/kolibri/utils/options.py#L287
The env var would that would be set on BCK then be mediated via the options.py machinery - it would presumably need more options, much like the Redis cache does, to configure the bucket, permissions, etc.
What Richard said. No gcloud utility. That’d be kinda insane. Because it’s running on google “hardware” it has ways of figuring out perms.
We typically rely on the default behavior to this point.
To learn more: https://cloud.google.com/docs/authentication/provide-credentials-adc
Observed behavior
On instances running in the cloud using BCK, Kolibri is unable to provide features that make use of temporary storage. Two examples were discovered by NCC testing on the Vodafone BCK pentesting instance.
1) Cannot upload a CSV to import users 2) When generating logs, the links to download the successfully generated logs returns 404
A path toward solving this will need to look into storing user uploaded files and pod-generated files in a GCS bucket and referencing that location rather than a local file system when generating or storing files.
Note there may be more instances where this is a problem and it should be considered for all future features in Kolibri that involve temporary file storage or user file uploads.
Expected behavior
All Kolibri features work in the cloud instances as expected.
User-facing consequences
Cloud Kolibri instances have broken features.
Steps to reproduce
Try a BCK-deployed Kolibri to generate logs or import users by CSV.
Context
Kolibri 0.15.2 BCK VF Pentesting instance