Open amks1 opened 2 years ago
Hello @amks1 , thank you for you contribution.
As you have noticed, in 2.022.08
, attachments are being served by KPI and not by KoBoCAT anymore.
Actually behind the scene, KPI still reads attachments from KoBoCAT bucket.
So, your code should work as expected. Something you may not know if that NGINX is used to serve attachments content because it does a better job than Django/Python and may be that's where it fails.
https://github.com/kobotoolbox/kpi/blob/cdd172b2bd4898d5d1afa2e0bc3320f82b9c25fe/kpi/views/v2/attachment.py#L132-L137
Please have look at: https://blog.horejsek.com/nginx-x-accel-explained/
In NGINX configuration, there is a special directive to serve those files from S3 directly. https://github.com/kobotoolbox/kobo-docker/blob/master/nginx/kobo-docker-scripts/include.protected_directive.conf#L7-L36
What you can try to see if it's an issue with the nginx header, comment lines in attachment.py
linked above and return what's under the TESTING
condition all the time.
Something like that.
# If unit tests are running, pytest webserver does not support
# `X-Accel-Redirect` header (or ignores it?). We need to pass
# the content to the Response object
# if settings.TESTING:
# setting the content type to `None` here allows the renderer to
# specify the content type for the response
content_type = (
attachment.mimetype
if request.accepted_renderer.format != MP3ConversionRenderer.format
else None
)
return Response(
attachment.content,
content_type=content_type,
)
# Otherwise, let NGINX determine the correct content type and serve
# the file
# headers = {
# 'Content-Disposition': f'inline; filename={attachment.media_file_basename}',
# 'X-Accel-Redirect': protected_path
# }
# response = Response(content_type='', headers=headers)
# return response
One other thing, be sure to set
KOBOCAT_DEFAULT_FILE_STORAGE
KOBOCAT_AWS_STORAGE_BUCKET_NAME
But I guess there are already since your upload works correctly.
Thanks @noliveleger. I tried this today but it doesn't work. I just get a 500 server error at this URL:
api/v2/assets/aMFkDi2h4QpkNUUBSekiSj/data/4/attachments/4/
It didn't seem like an nginx issue to me, it feels like there's something in the new KPI code hardcoded to AWS which is overriding the settings provided. Or maybe it's is looking for some other settings key.
Direct kobocat links are working fine as before.
@amks1, I'll have a look and let you know.
@noliveleger
Got the issue, it's here:
if settings.TESTING or True:
# setting the content type to `None` here allows the renderer to
# specify the content type for the response
try:
content_type = (
attachment.mimetype
if request.accepted_renderer.format != MP3ConversionRenderer.format
else None
)
# 'attachment.content' does not work since
# ReadOnlyKobocatAttachment object does not contain 'content' field.
# So it has been replaced with 'attachment.media_file'.
return Response(
attachment.media_file,
content_type=content_type,
)
except Exception as e:
raise serializers.ValidationError({
'detail': str(e)
}, 'unknown_error')
With this, KPI serves the files from Spaces without issue. However I'd still like to let nginx serve them.
After messing around in the nginx configurations, I found that the following code was only placed under the server
block for Kobocat. After copy-pasting it under the KPI server
block, it works.
location ~ ^/protected-s3/(.*)$ {
# Allow internal requests only, i.e. return a 404 to any client who
# tries to access this location directly
internal;
# Name resolution won't work at all without specifying a resolver here.
# Configuring a validity period is useful for overriding Amazon's very
# short (5-second?) TTLs.
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 10s;
# Everything that S3 needs is in the URL; don't pass any headers or
# body content that the client may have sent
proxy_pass_request_body off;
proxy_pass_request_headers off;
# Stream the response to the client instead of trying to read it all at
# once, which would potentially use disk space
proxy_buffering off;
# Don't leak S3 headers to the client. List retrieved from:
# https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html
proxy_hide_header x-amz-delete-marker;
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-version-id;
# S3 will complain if `$1` contains non-encoded special characters.
# KoBoCAT must encode twice to make sure `$1` is still encoded after
# NGINX's automatic URL decoding.
proxy_pass $1;
KPI now works properly with DigitalOcean Spaces.
@jnm One difference I noticed between the current KPI-served attachments and the earlier Kobocat-served attachments is that the 'large'/ 'medium'/ 'small' files don't get generated anymore. The original dimension file is the one that gets displayed in the submission view modal - higher res files don't fit in the table and break the symmetry. (This has been confirmed with the public kobotoolbox installation as well).
Thanks, this is on our list of things to fix: https://github.com/kobotoolbox/kpi/issues/3672
On Sat, Apr 30, 2022, 10:12 amks1 @.***> wrote:
@jnm https://github.com/jnm One difference I noticed between the current KPI-served attachments and the earlier Kobocat-served attachments is that the 'large'/ 'medium'/ 'small' files don't get generated anymore. The original dimension file is the one that gets displayed in the submission view modal - higher res files don't fit in the table and break the symmetry. (This has been confirmed with the public kobotoolbox installation as well).
— Reply to this email directly, view it on GitHub https://github.com/kobotoolbox/kpi/issues/3750#issuecomment-1113995712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP5BFLNKLTSTVQ46LEPBJTVHU5VZANCNFSM5SM6HUIA . You are receiving this because you were mentioned.Message ID: @.***>
@noliveleger
Got the issue, it's here:
With this, KPI serves the files from Spaces without issue. However I'd still like to let nginx serve them.
Well, that's what I told you ;-) but I have to admit that the or True
is way simpler that commenting several lines.
After messing around in the nginx configurations, I found that the following code was only placed under the
server
block for Kobocat. After copy-pasting it under the KPIserver
block, it works.location ~ ^/protected-s3/(.*)$ { # Allow internal requests only, i.e. return a 404 to any client who # tries to access this location directly internal; # Name resolution won't work at all without specifying a resolver here. # Configuring a validity period is useful for overriding Amazon's very # short (5-second?) TTLs. resolver 8.8.8.8 8.8.4.4 valid=300s; resolver_timeout 10s; # Everything that S3 needs is in the URL; don't pass any headers or # body content that the client may have sent proxy_pass_request_body off; proxy_pass_request_headers off; # Stream the response to the client instead of trying to read it all at # once, which would potentially use disk space proxy_buffering off; # Don't leak S3 headers to the client. List retrieved from: # https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html proxy_hide_header x-amz-delete-marker; proxy_hide_header x-amz-id-2; proxy_hide_header x-amz-request-id; proxy_hide_header x-amz-version-id; # S3 will complain if `$1` contains non-encoded special characters. # KoBoCAT must encode twice to make sure `$1` is still encoded after # NGINX's automatic URL decoding. proxy_pass $1;
KPI now works properly with DigitalOcean Spaces.
🤔 I think you are not using the latest version of kobo-docker then, because, AFAIK, it is included under KPI server block. https://github.com/kobotoolbox/kobo-docker/blob/887186980f4115b5ca4e1a526b8d348cdddb6055/nginx/kobo-docker-scripts/templates/nginx_site_default.conf.tmpl#L90
Well, that's what I told you ;-) but I have to admit that the
or True
is way simpler that commenting several lines.
I meant this part, the test code references attachment.content
but the correct attribute seems to be attachment.media_file
:
# 'attachment.content' does not work since
# ReadOnlyKobocatAttachment object does not contain 'content' field.
# So it has been replaced with 'attachment.media_file'.
return Response(
attachment.media_file,
content_type=content_type,
)
🤔 I think you are not using the latest version of kobo-docker then, because, AFAIK, it is included under KPI server block. https://github.com/kobotoolbox/kobo-docker/blob/887186980f4115b5ca4e1a526b8d348cdddb6055/nginx/kobo-docker-scripts/templates/nginx_site_default.conf.tmpl#L90
I had pulled the correct tag (v2.022.08) in kobo-install
before starting, but yes it's more than possible that I bungled up somewhere...
I meant this part, the test code references
attachment.content
but the correct attribute seems to beattachment.media_file
:
Oops. I did not notice the difference. Thank you for pointing that. The Mock class should expose same properties.
Preface
I know it's probably not a priority currently to get Kobotoolbox running with other S3-like providers like DigitalOcean, but I do feel that there could be many interested parties. This is because of much lower costs compared to AWS S3 including no charges per request.
Description
Over the past 2 days, I've been trying to make Kobotoolbox work with DigitalOcean Spaces. Here's what I've done so far:
Trial 1: Trying with Kpi and Kobocat installation
v2.021.45
I added the following lines to settings of both Kpi and Kobocat:
Image submissions, image fetching and asset xls exports were working well, but data exports returned a weird error:
However, legacy exports were working well so I figured it's an issue with Kpi. After a long time (long time because of my inexperience with Django) I figured that Kpi data exports use
PrivateStorageDetailView
, and adding the following line to settings solved the issue:Now
v2.021.45
was working well with DigitalOcean Spaces.Trial 2: Trying with Kpi and Kobocat installation
v2.022.08
On upgrading the earlier installation to the new version, I find that data exports and media URLs therein work, but Kpi frontend view does not. The URLs used to access media from KPI uses the
v2/assets/.../data/../attachment/...
endpoint in this version compared to thekc.url...
in the previous version. It now returns a 404 for all media files (they get uploaded no issues).The endpoint uses Django's
models.FileField
to retrieve the media file, but I've not been able to get this working with my S3 custom URL.And it looks like I've exhausted my Django capabilities. Would anyone be able to guide me forward?