Closed jpds closed 1 year ago
@RomanHargrave Could you please take a look at this when you have a moment? Also happy to test any hotfixes you'd like to try.
To follow
Seeing this now. I'll try and see if I can take a look this weekend.
@jpds - I see that there is a leading slash in the logs from garage. Would you kindly make sure that your bucket URL does not have a trailing slash? I suspect that their router, after some skimming, is interpreting the double-slash as an empty object key.
@RomanHargrave Thanks, that indeed fixed that issue. Now I can successfully upload files, however the receiving clients do not appear to be then using a presigned URL. This is what I'm now seeing in the Garage logs:
2023-06-03T12:17:32.385948Z INFO garage_api::generic_server XXX.YYY.ZZZ.ABC (via [::1]:31267) PUT /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg?X-Amz-Signature=REDACTED&Content-Length=3407271&Content-Type=image%2Fjpeg&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=REDACTED%2F20230603%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230603T121732Z&X-Amz-Expires=600&X-Amz-SignedHeaders=host
2023-06-03T12:17:34.581710Z INFO garage_api::generic_server AAA.BBB.CCC.ZYX (via [::1]:31267) HEAD /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg
2023-06-03T12:17:34.581865Z INFO garage_api::generic_server Response: error 403 Forbidden, Forbidden: Garage does not support anonymous access yet
2023-06-03T12:17:35.039648Z INFO garage_api::generic_server AAA.BBB.CCC.ZYX (via [::1]:31267) GET /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg
2023-06-03T12:17:35.039795Z INFO garage_api::generic_server Response: error 403 Forbidden, Forbidden: Garage does not support anonymous access yet
If I take that same uploaded file, and run:
$ aws --endpoint=$AWS_ENDPOINT s3 presign s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg
I'm given a URL that I can use to open the file in a browser - is there something I'm missing that gives this filename plus the X-Amz
parameters to the receiving client?
The design intention was to use this such that the bucket is configured for public read (but not necessarily list) - this is probably not ideal for people not using OMEMO though, even though it would be quite costly to enumerate objects even with the shorter salt at the beginning (in the prototype implementation, it was 128 chars or something), and is evidently a problem for garage. It should not be a huge stretch to sign read URLs, though, as the logic is identical. I may have the bandwidth to attend to that.
Alternatively, I could expose the bucket as a public website:
This would allow for the public read, but not list. The only thing it would need is a way to tell the receiving client the FQDN assigned to the bucket and also remove the S3 domain, so:
storage.xmpp-domain2.com.s3.domain1.com/GLKBDDZDP3WVXX0-IMG_test.jpg?X-Amz-....
s3.domain1.com/storage.xmpp-domain2.com/GLKBDDZDP3WVXX0-IMG_test.jpg?X-Amz-....
storage.xmpp-domain2.com/GLKBDDZDP3WVXX0-IMG_test.jpg
Wasabi and AWS also support this approach: https://monotai.com/howto-host-static-websites-on-wasabi.html
Happy to try either.
It's far more straightforward to sign the download URL than to support alternate hosts. Would you test the relevant changes and let me know if they solve the problem?
As an aside, it is a bit of a faux-pas create an immortal presigned URL - I wonder if AWS cares. I might check with Wasabi.
Thanks for that. Garage also seems to want a TTL for the presigned URLs, it returns:
Response: error 400 Bad Request, Bad request: X-Amz-Expires not found in query parameters
I replaced the never
with TTL
for GetURL
and then got:
2023-06-05T08:26:53.669301Z INFO garage_api::generic_server: 2a00:... (via [::1]:46649) HEAD /s3-upload/GLMCE9Z3O2WVXX0-IMG_test2.jpg?X-Amz-Signature=e12a3f6d156a2869d918ea905708cc2cf712e491f757835c62a6cdd568a1f0a9&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ACCESSKEY%2F20230605%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230605T082651Z&X-Amz-Expires=432000&X-Amz-SignedHeaders=host
2023-06-05T08:26:53.671415Z INFO garage_api::generic_server: Response: error 403 Forbidden, Forbidden: Invalid signature
I also had a look at implementing the alternative host last night, and realized that it'd be simpler if bucket_url
was split into s3_endpoint
and bucket
(with the slight disadvantage that it would not help that don't use an FQDN for their bucket names).
I was concerned about that, and did not expect that it would work. The AWS signing spec also stipulates that the TTL should not exceed seven days. We can try signing it with a longer lifespan, and see what happens - I suppose I could see if Garage checks that.
I find it highly idiosyncratic that Garage "does not support anonymous access yet" but purports to support custom domains for buckets ("how to host a static website...").
I am not entirely opposed to the idea of adding support for a separate download host, but I am concerned that it might be effort spent to accommodate a knowingly non-compliant/incomplete S3 implementation, and that it might (though unlikely) encourage further requests such as "can you transform the resource path for the GET URL when I have a custom GET host so that I can offload uploads to S3 but serve from some custom web app?" At which point you start to heap additional functionality into the module and compromise its long-term maintainability.
I find it highly idiosyncratic that Garage "does not support anonymous access yet" but purports to support custom domains for buckets ("how to host a static website...").
There's two different components here:
I am not entirely opposed to the idea of adding support for a separate download host, but I am concerned that it might be effort spent to accommodate a knowingly non-compliant/incomplete S3 implementation
This isn't a limitation on Garage's part. In fact, AWS strongly recommend that users never enable public read on any of their S3 buckets (as this has been the source of a myriad of companies getting hacked through misconfigured policies):
If you want to host static files from a bucket over in their ecosystem, all the documentation points you towards CloudFront:
This is why, instead of pointing XMPP clients at something like: s3.eu-central-1.wasabisys.com
, I think it would be better to do media.my-xmpp-server.nl
and that just transparently points at a CDN fronting a S3 bucket somewhere (which is what CloudFront/Garage Web is).
I also think that removing an S3 specific logic from the XMPP client level is better long-term - for instance, if a colleague or another XMPP client of mine is offline for 2 weeks - the X-Amz-Expires
in a week would mean that the data is still in the bucket but inaccessible by those clients.
I can see about working in some manner of solution over the weekend.
I do want to note that, with respect to the matter of bucket access controls in this case that S3's function here is as some way to share a file to a potentially unlimited audience, and that mod_s3_upload
by default prefers to set the ACL for each object to public-read
rather than assuming that the bucket is publicly readable. When folks have been bitten by misconfigured ACLs, it has been because of storing things such as private keys or trade secrets in buckets with the expectation that they were private.
XEP-0363 users, on the other hand, are storing files with the expectation that - bar file-level security controls such as OMEMO - they are relinquishing control over that data to both the system operators and whomever receives that download link. They may not expect that some completely removed party could find the file, e.g. by brute force; however, this is a threat that many similar solutions must contend with and is usually solved by making it difficult to guess file names. The disambiguating value at the beginning of mod_s3_upload
object names is primarily intended to prevent collisions, but still has a decent keyspace and contains data (hash of the BEAM node name) that is not easily knowable to a completely removed third party, and could be improved for this purpose by the addition of a random value. Using a CDN in this case may make it more difficult to enumerate objects in the bucket simply because it may apply additional rate limiting or be placed behind a basic WAF, but it is still theoretically enumerable as long as one can make unauthenticated requests to whatever is serving the files. Authenticated download links would be great, since they would effectively prevent enumeration, but alas that option is not practical.
In cases such as a private XMPP service or MUC where sensitive content is being shared, I should certainly hope that OMEMO is being used at the very least - after all, I would consider OMEMO followed by data sovereignty to be the two most critically attractive features of XMPP. If it is desirable to reduce the likelihood of unintended access to the OMEMO-encrypted files, then the use of a storage service accessible only from a private network - possibly even requiring TLS peer authentication, if your client can do it - would be the next step.
There is possibly merit to some kind stronger user-aware authentication scheme; however, because of the way that XEP-0363 works, this would effectively require a new and more complex XEP to be drafted, submitted, cared for, and implemented.
This concludes my spiel about S3, file sharing, and security.
@jpds in that same branch, I've made changes that support a separate download host, if desired. Please feel free to take them for a spin.
@RomanHargrave The flow works great, however it appears that the microsecond function is causing there to be a two digit difference with the object name the receiver then tries to get:
2023-06-15T10:39:35.351128Z INFO garage_api::generic_server XXX.YYY.ZZZ.ABC (via [::1]:36540) PUT /objects.xmpp-server.net/GLXGYYW15EWVXX0-IMG_test.jpg?X-Amz-Signature=...&Content-Length=2945594&Content-Type=image%2Fjpeg&X-Amz-Acl=public-read&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...%2F20230615%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230615T103935Z&X-Amz-Expires=432000&X-Amz-SignedHeaders=host
2023-06-15T10:39:37.598337Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:50745) HEAD /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.598958Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:27770) HEAD /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.605904Z INFO garage_web::web_server HEAD 404 Not Found /GLXGYYW1UIWVXX0-IMG_test.jpg API error: Key not found
2023-06-15T10:39:37.638382Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:50745) GET /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.681171Z INFO garage_web::web_server GET 404 Not Found /GLXGYYW1UIWVXX0-IMG_test.jpg API error: Key not found
Edit: Fixed with a simple:
diff --git a/mod_s3_upload/src/mod_s3_upload.erl b/mod_s3_upload/src/mod_s3_upload.erl
index 11a365c..4d0f48c 100644
--- a/mod_s3_upload/src/mod_s3_upload.erl
+++ b/mod_s3_upload/src/mod_s3_upload.erl
@@ -409,9 +409,10 @@ put_get_url(#params{bucket_url = BucketURL,
ttl = TTL} = Params,
UploadRequest,
Filename) ->
- UnsignedPutURL = decorated_put_url(UploadRequest, Params, object_url(BucketURL, Filename)),
+ ObjectName = object_name(Filename),
+ UnsignedPutURL = decorated_put_url(UploadRequest, Params, object_url(BucketURL, ObjectName)),
{aws_util:signed_url(Auth, put, ?AWS_SERVICE_S3, UnsignedPutURL, [], calendar:universal_time(), TTL),
- object_url(DownloadURL, Filename)}.
+ object_url(DownloadURL, ObjectName)}.
-spec url_service_parameters(
Params :: #params{}
@@ -463,7 +464,6 @@ decorated_put_url(UploadRequest, ServiceParams, URL) ->
% generate a unique random object URL for the given filename
object_url(BucketURL, Filename) ->
#{path := BasePath} = UriMap = uri_string:parse(BucketURL),
- ObjectName = object_name(Filename),
uri_string:recompose(UriMap#{path => <<BasePath/binary, "/", ObjectName/binary>>}).
-spec object_name(
it appears that the microsecond function
yeah, it would. i'll attend to that.
This has been working perfectly - I've pushed up documentation here:
...which will appear here when it's published: https://garagehq.deuxfleurs.fr/documentation/connect/apps/ - so we can add this to the README.md for compatible services. :-)
ejabberdctl modules_update_specs
ejabberdctl module_upgrade mod_s3_upload
or in WebAdmin page -> Nodes -> your node -> Contrib Modules
I've just tried using
mod_s3_upload
with a Garage cluster I have and it returns in the logs:There's no error from ejabberd after this, however the Garage logs show:
The
bucket_url
I've configured for the module already exists and the key has access to it. I do not see a reference to creating a bucket in the module code - so an underlying library must be doing the call.The Garage code being called is here: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/api/s3/bucket.rs#L128