octue / django-gcp

Everything required to run Django on GCP (storage, pubsub events, tasks, logging, errors)
Other
19 stars 3 forks source link

Consider addition of endpoint to get direct upload URLs #28

Open thclark opened 1 year ago

thclark commented 1 year ago

Feature request

Current state

When using BlobField to upload blobs to GCS, the upload is made to a temporary file, with a fixed content-type (application/octet-stream). Then, on successful commit of the transaction (ie once the corresponding row is saved in the database) the temporary blob is assigned its metadata and moved to its ultimate destination. This is good because:

However, this mechanism limits you to uploading files using BlobField.

Use Case

I want to upload files from another service directly to GCS, using django-gcp as the permissions manager to sign URLs but without registering the files in BlobField

Proposed Solution

Create an endpoint to sign URLS that's accessible by the frontend, given a signing token. Thus the frontend can

Add a view like the following to storage/views.py:

import datetime
import json
import random
import string
import time
import django.core.signing
from django.http import HttpResponse, HttpResponseBadRequest
from django.utils import baseconv, timezone
from django.views.decorators.http import require_POST
from google.cloud.storage import Blob, Bucket

from .bucket_registry import _bucket_registry

URLSAFE_CHARACTERS = string.ascii_letters + string.digits + "-._~"
REQUIRED_PARAMS = ["token", "filename", "content_type"]

signer = django.core.signing.Signer()

@require_POST
def get_direct_upload_url(request):
    """Responds with a pre-signed URL enabling the client to upload an object to the bucket"""

    for p in REQUIRED_PARAMS:
        if not request.POST.get(p):
            return HttpResponseBadRequest(f"'{p}' is a required parameter.")
    try:
        token: str = signer.unsign(request.POST["token"])
    except django.core.signing.BadSignature:
        return HttpResponseBadRequest("Invalid token.")

    bucket_and_path, include_timestamp_indicator, exptime = token.rsplit(":", 2)
    if time.time() > baseconv.base62.decode(exptime):
        return HttpResponseBadRequest("Timeout expired.")

    bucketname, path_prefix = bucket_and_path[5:].split("/", 1)
    bucket: Bucket = _bucket_registry.get("gs://" + bucketname)
    if not bucket:
        return HttpResponseBadRequest(f"Unknown bucket identifier 'gs://{bucketname}'.")

    filename: str = request.POST["filename"]
    content_type: str = request.POST["content_type"]

    timestring: str = f"{timezone.now():%Y-%m-%d_%H-%M-%S/}" if include_timestamp_indicator == "1" else ""
    randomstring: str = "".join(random.choices(URLSAFE_CHARACTERS, k=24))
    path: str = f"{path_prefix}{timestring}{randomstring}/{filename}"
    blob: Blob = bucket.blob(path)

    return HttpResponse(
        json.dumps(
            {
                "url": blob.generate_signed_url(
                    expiration=timezone.now() + datetime.timedelta(minutes=60),
                    method="PUT",
                    content_type=content_type,
                ),
                "path": path,
            }
        )
    )

Then use this code snippet to generate the token and URL enabling the frontend to call the signing endpoint (in storage/utils.py):

import logging
import os
import time
from django.core.signing import Signer
from django.urls import reverse
from django.utils import baseconv
import datetime
import time
from django.utils import baseconv, timezone

signer = Signer()

def get_signing_token_and_url(bucket_name, path_prefix):
    bucket_identifier = f"gs://{bucket_name}"

    # Get signing url and a token to pass to it, allows the frontend to sign on demand
    # NOTE: These are currently not used but are taken from the DDCU library and could be
    include_timestamp_indicator = "1" if self.include_timestamp else "0"
    valid_until = baseconv.base62.encode(int(time.time()) + self.submit_timeout)
    signing_path = os.path.join(bucket_identifier, path_prefix)
    to_sign = f"{signing_path}:{include_timestamp_indicator}:{valid_until}"

    signing_token = signer.sign(to_sign)
    signing_url = reverse("gcp-storage-get-direct-upload-url")

Finally, add the corresponding URL (urlss.py):


from django_gcp.storage.views import get_direct_upload_url
# ...

urlpatterns = [
    # ...
    path(r"storage/get-direct-upload-url", get_direct_upload_url, name="gcp-storage-get-direct-upload-url"),
]