processone / ejabberd-contrib

Growing and curated ejabberd contributions repository - PR or ask to join !
http://ejabberd.im
248 stars 137 forks source link

mod_s3_upload with Garage bucket returns 400 #319

Closed jpds closed 1 year ago

jpds commented 1 year ago

I've just tried using mod_s3_upload with a Garage cluster I have and it returns in the logs:

2023-05-13 17:27:50.353851+00:00 [info] <0.381.0>@mod_s3_upload:handle_iq/2:337 Generating S3 Object URL Pair for ...@... to upload file IMG_test.jpg (2483512 bytes)

There's no error from ejabberd after this, however the Garage logs show:

2023-05-13T17:27:50.657651Z INFO garage_api::generic_server XXX.YYY.ZZZ.ABC (via [::1]:50102) PUT //GKXEEFUH39WVXX0-IMG_test.jpg?X-Amz-Signature=64CHARACTERS&Content-Length=2483512&Content-Type=image%2Fjpeg&X-Amz-Acl=public-read&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ACCESS_KEY%2F20230513%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230513T172750Z&X-Amz-Expires=600&X-Amz-SignedHeaders=host
2023-05-13T17:27:55.477291Z INFO garage_api::generic_server Response: error 400 Bad Request, Bad request: Invalid create bucket XML query

The bucket_url I've configured for the module already exists and the key has access to it. I do not see a reference to creating a bucket in the module code - so an underlying library must be doing the call.

The Garage code being called is here: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/api/s3/bucket.rs#L128

jpds commented 1 year ago

@RomanHargrave Could you please take a look at this when you have a moment? Also happy to test any hotfixes you'd like to try.

Neustradamus commented 1 year ago

To follow

RomanHargrave commented 1 year ago

Seeing this now. I'll try and see if I can take a look this weekend.

RomanHargrave commented 1 year ago

@jpds - I see that there is a leading slash in the logs from garage. Would you kindly make sure that your bucket URL does not have a trailing slash? I suspect that their router, after some skimming, is interpreting the double-slash as an empty object key.

jpds commented 1 year ago

@RomanHargrave Thanks, that indeed fixed that issue. Now I can successfully upload files, however the receiving clients do not appear to be then using a presigned URL. This is what I'm now seeing in the Garage logs:

2023-06-03T12:17:32.385948Z INFO garage_api::generic_server XXX.YYY.ZZZ.ABC (via [::1]:31267) PUT /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg?X-Amz-Signature=REDACTED&Content-Length=3407271&Content-Type=image%2Fjpeg&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=REDACTED%2F20230603%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230603T121732Z&X-Amz-Expires=600&X-Amz-SignedHeaders=host
2023-06-03T12:17:34.581710Z INFO garage_api::generic_server AAA.BBB.CCC.ZYX (via [::1]:31267) HEAD /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg
2023-06-03T12:17:34.581865Z INFO garage_api::generic_server Response: error 403 Forbidden, Forbidden: Garage does not support anonymous access yet
2023-06-03T12:17:35.039648Z INFO garage_api::generic_server AAA.BBB.CCC.ZYX (via [::1]:31267) GET /s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg
2023-06-03T12:17:35.039795Z INFO garage_api::generic_server Response: error 403 Forbidden, Forbidden: Garage does not support anonymous access yet

If I take that same uploaded file, and run:

$ aws --endpoint=$AWS_ENDPOINT s3 presign s3-upload/GLKBDDZDP3WVXX0-IMG_test.jpg

I'm given a URL that I can use to open the file in a browser - is there something I'm missing that gives this filename plus the X-Amz parameters to the receiving client?

RomanHargrave commented 1 year ago

The design intention was to use this such that the bucket is configured for public read (but not necessarily list) - this is probably not ideal for people not using OMEMO though, even though it would be quite costly to enumerate objects even with the shorter salt at the beginning (in the prototype implementation, it was 128 chars or something), and is evidently a problem for garage. It should not be a huge stretch to sign read URLs, though, as the logic is identical. I may have the bandwidth to attend to that.

jpds commented 1 year ago

Alternatively, I could expose the bucket as a public website:

This would allow for the public read, but not list. The only thing it would need is a way to tell the receiving client the FQDN assigned to the bucket and also remove the S3 domain, so:

Wasabi and AWS also support this approach: https://monotai.com/howto-host-static-websites-on-wasabi.html

Happy to try either.

RomanHargrave commented 1 year ago

It's far more straightforward to sign the download URL than to support alternate hosts. Would you test the relevant changes and let me know if they solve the problem?

As an aside, it is a bit of a faux-pas create an immortal presigned URL - I wonder if AWS cares. I might check with Wasabi.

jpds commented 1 year ago

Thanks for that. Garage also seems to want a TTL for the presigned URLs, it returns:

Response: error 400 Bad Request, Bad request: X-Amz-Expires not found in query parameters

I replaced the never with TTL for GetURL and then got:

2023-06-05T08:26:53.669301Z  INFO garage_api::generic_server: 2a00:... (via [::1]:46649) HEAD /s3-upload/GLMCE9Z3O2WVXX0-IMG_test2.jpg?X-Amz-Signature=e12a3f6d156a2869d918ea905708cc2cf712e491f757835c62a6cdd568a1f0a9&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ACCESSKEY%2F20230605%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230605T082651Z&X-Amz-Expires=432000&X-Amz-SignedHeaders=host
2023-06-05T08:26:53.671415Z  INFO garage_api::generic_server: Response: error 403 Forbidden, Forbidden: Invalid signature

I also had a look at implementing the alternative host last night, and realized that it'd be simpler if bucket_url was split into s3_endpoint and bucket (with the slight disadvantage that it would not help that don't use an FQDN for their bucket names).

RomanHargrave commented 1 year ago

I was concerned about that, and did not expect that it would work. The AWS signing spec also stipulates that the TTL should not exceed seven days. We can try signing it with a longer lifespan, and see what happens - I suppose I could see if Garage checks that.

jpds commented 1 year ago

It also stipulates 7 days: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/44548a9114766b1b58887a1888472ad95847b4f6/src/api/signature/payload.rs#L186

RomanHargrave commented 1 year ago

I find it highly idiosyncratic that Garage "does not support anonymous access yet" but purports to support custom domains for buckets ("how to host a static website...").

I am not entirely opposed to the idea of adding support for a separate download host, but I am concerned that it might be effort spent to accommodate a knowingly non-compliant/incomplete S3 implementation, and that it might (though unlikely) encourage further requests such as "can you transform the resource path for the GET URL when I have a custom GET host so that I can offload uploads to S3 but serve from some custom web app?" At which point you start to heap additional functionality into the module and compromise its long-term maintainability.

jpds commented 1 year ago

I find it highly idiosyncratic that Garage "does not support anonymous access yet" but purports to support custom domains for buckets ("how to host a static website...").

There's two different components here:

I am not entirely opposed to the idea of adding support for a separate download host, but I am concerned that it might be effort spent to accommodate a knowingly non-compliant/incomplete S3 implementation

This isn't a limitation on Garage's part. In fact, AWS strongly recommend that users never enable public read on any of their S3 buckets (as this has been the source of a myriad of companies getting hacked through misconfigured policies):

If you want to host static files from a bucket over in their ecosystem, all the documentation points you towards CloudFront:

This is why, instead of pointing XMPP clients at something like: s3.eu-central-1.wasabisys.com, I think it would be better to do media.my-xmpp-server.nl and that just transparently points at a CDN fronting a S3 bucket somewhere (which is what CloudFront/Garage Web is).

I also think that removing an S3 specific logic from the XMPP client level is better long-term - for instance, if a colleague or another XMPP client of mine is offline for 2 weeks - the X-Amz-Expires in a week would mean that the data is still in the bucket but inaccessible by those clients.

RomanHargrave commented 1 year ago

I can see about working in some manner of solution over the weekend.

Pedantry section

I do want to note that, with respect to the matter of bucket access controls in this case that S3's function here is as some way to share a file to a potentially unlimited audience, and that mod_s3_upload by default prefers to set the ACL for each object to public-read rather than assuming that the bucket is publicly readable. When folks have been bitten by misconfigured ACLs, it has been because of storing things such as private keys or trade secrets in buckets with the expectation that they were private.

XEP-0363 users, on the other hand, are storing files with the expectation that - bar file-level security controls such as OMEMO - they are relinquishing control over that data to both the system operators and whomever receives that download link. They may not expect that some completely removed party could find the file, e.g. by brute force; however, this is a threat that many similar solutions must contend with and is usually solved by making it difficult to guess file names. The disambiguating value at the beginning of mod_s3_upload object names is primarily intended to prevent collisions, but still has a decent keyspace and contains data (hash of the BEAM node name) that is not easily knowable to a completely removed third party, and could be improved for this purpose by the addition of a random value. Using a CDN in this case may make it more difficult to enumerate objects in the bucket simply because it may apply additional rate limiting or be placed behind a basic WAF, but it is still theoretically enumerable as long as one can make unauthenticated requests to whatever is serving the files. Authenticated download links would be great, since they would effectively prevent enumeration, but alas that option is not practical.

In cases such as a private XMPP service or MUC where sensitive content is being shared, I should certainly hope that OMEMO is being used at the very least - after all, I would consider OMEMO followed by data sovereignty to be the two most critically attractive features of XMPP. If it is desirable to reduce the likelihood of unintended access to the OMEMO-encrypted files, then the use of a storage service accessible only from a private network - possibly even requiring TLS peer authentication, if your client can do it - would be the next step.

There is possibly merit to some kind stronger user-aware authentication scheme; however, because of the way that XEP-0363 works, this would effectively require a new and more complex XEP to be drafted, submitted, cared for, and implemented.

This concludes my spiel about S3, file sharing, and security.

RomanHargrave commented 1 year ago

@jpds in that same branch, I've made changes that support a separate download host, if desired. Please feel free to take them for a spin.

jpds commented 1 year ago

@RomanHargrave The flow works great, however it appears that the microsecond function is causing there to be a two digit difference with the object name the receiver then tries to get:

2023-06-15T10:39:35.351128Z INFO garage_api::generic_server XXX.YYY.ZZZ.ABC (via [::1]:36540) PUT /objects.xmpp-server.net/GLXGYYW15EWVXX0-IMG_test.jpg?X-Amz-Signature=...&Content-Length=2945594&Content-Type=image%2Fjpeg&X-Amz-Acl=public-read&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...%2F20230615%2Fgarage%2Fs3%2Faws4_request&X-Amz-Date=20230615T103935Z&X-Amz-Expires=432000&X-Amz-SignedHeaders=host
2023-06-15T10:39:37.598337Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:50745) HEAD /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.598958Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:27770) HEAD /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.605904Z INFO garage_web::web_server HEAD 404 Not Found /GLXGYYW1UIWVXX0-IMG_test.jpg API error: Key not found
2023-06-15T10:39:37.638382Z INFO garage_web::web_server ABC.ABC.ABC.ZYX (via [::1]:50745) GET /GLXGYYW1UIWVXX0-IMG_test.jpg
2023-06-15T10:39:37.681171Z INFO garage_web::web_server GET 404 Not Found /GLXGYYW1UIWVXX0-IMG_test.jpg API error: Key not found

Edit: Fixed with a simple:

diff --git a/mod_s3_upload/src/mod_s3_upload.erl b/mod_s3_upload/src/mod_s3_upload.erl
index 11a365c..4d0f48c 100644
--- a/mod_s3_upload/src/mod_s3_upload.erl
+++ b/mod_s3_upload/src/mod_s3_upload.erl
@@ -409,9 +409,10 @@ put_get_url(#params{bucket_url = BucketURL,
                     ttl = TTL} = Params,
             UploadRequest,
             Filename) ->
-    UnsignedPutURL = decorated_put_url(UploadRequest, Params, object_url(BucketURL, Filename)),
+    ObjectName = object_name(Filename),
+    UnsignedPutURL = decorated_put_url(UploadRequest, Params, object_url(BucketURL, ObjectName)),
     {aws_util:signed_url(Auth, put, ?AWS_SERVICE_S3, UnsignedPutURL, [], calendar:universal_time(), TTL),
-     object_url(DownloadURL, Filename)}.
+     object_url(DownloadURL, ObjectName)}.

 -spec url_service_parameters(
         Params :: #params{}
@@ -463,7 +464,6 @@ decorated_put_url(UploadRequest, ServiceParams, URL) ->
 % generate a unique random object URL for the given filename
 object_url(BucketURL, Filename) ->
     #{path := BasePath} = UriMap = uri_string:parse(BucketURL),
-    ObjectName = object_name(Filename),
     uri_string:recompose(UriMap#{path => <<BasePath/binary, "/", ObjectName/binary>>}).

 -spec object_name(
RomanHargrave commented 1 year ago

it appears that the microsecond function

yeah, it would. i'll attend to that.

jpds commented 1 year ago

This has been working perfectly - I've pushed up documentation here:

...which will appear here when it's published: https://garagehq.deuxfleurs.fr/documentation/connect/apps/ - so we can add this to the README.md for compatible services. :-)

badlop commented 1 year ago

320 is merged. The module can be updated in a running server with:

ejabberdctl modules_update_specs
ejabberdctl module_upgrade mod_s3_upload

or in WebAdmin page -> Nodes -> your node -> Contrib Modules