transloadit / uppy

The next open source file uploader for web browsers :dog:
https://uppy.io
MIT License
29.18k stars 2.01k forks source link

@uppy/companion, remote sources: Encode the uploadId before putting it in the URL #4649

Closed arturi closed 5 months ago

arturi commented 1 year ago

Initial checklist

Problem

Ok, I believe the problem here is that Cloudflare R2 creates upload IDs with a slash in them (which s3 does not). Then because Uppy doesn't encode those ID's but instead just appends them to Companion's /s3/multipart/:uploadId URL, then Express.js will interpret the slash as a sub-path under /s3/multipart, e.g. /s3/multipart/12346/789, thus returning a 404.

So the only problem here is that uploadId is being sent unencoded as part of the URL, thus causing trouble. I think the correct solution initially would be to change this line to encode the uploadId before putting it in the URL:

https://github.com/transloadit/uppy/blob/327655ca7efd0fa10d6da0d2fc25ed911640cee8/packages/%40uppy/aws-s3-multipart/src/index.js#L419

And then in companion on this line we would have to decode the upload ID:

https://github.com/transloadit/uppy/blob/327655ca7efd0fa10d6da0d2fc25ed911640cee8/packages/%40uppy/companion/src/server/controllers/s3.js#L146

Solution

A problem is that I think this could be a breaking change, because old companion versions expect an un-encoded upload ID, and if Uppy suddenly starts to encode them, it would cause trouble.

Alternatives

Alternatively we could deprecate the URL param and either

  1. make a new endpoint where we correctly encode uploadId, for example /s3/multipart2/:uploadId
  2. use a POST endpoint, although this is not really RESTful

I think this problem is also the same for all of these, so we need to fix all (or else we will have the same problem):

https://github.com/transloadit/uppy/blob/327655ca7efd0fa10d6da0d2fc25ed911640cee8/packages/%40uppy/companion/src/server/controllers/s3.js#L366-L370

Not sure which encoding to use though. If we use encodeURIComponent, maybe it could conflict with how browsers and/or express.js also encode/decode the URL, depending on which characters it contains. base64 is also not safe to put in URLs.

arturi commented 1 year ago

@Acconut:

After reading through https://stackoverflow.com/questions/1957115/is-a-slash-equivalent-to-an-encoded-slash-2f-in-the-path-portion-of-a and doing some tests with curl and Firefox, it is technically possible to encode the upload ID. So the upload ID hello/world would then have the URL /uploads/hello%2Fworld. This does make sense and also prevents other potential issues where the upload ID contains a space or whatever character. The StackOverflow question mentions that some network components may not treat encoded parts properly and incorrectly decode them. But I would not see this a problem for Companion. You should make sure to document the requirement that proxies do not decode the URI when passing requests to companion. If so, I think encoding the multipart ID (or any other ID that you get from third-parties) makes sense.

rquinlivan commented 1 year ago

@arturi FYI the R2 team has updated the R2 API to use a URL-safe upload ID for multipart requests to avoid such issues.

arturi commented 1 year ago

@rquinlivan Yeah, much appreciated, thanks! But we were thinking we still need to safeguard against such cases internally.

Murderlon commented 10 months ago

Are there still specific issues we can still think of? Creating a breaking change because it might help something unknown is perhaps a bit much afterall?

mifi commented 10 months ago

I think it's definitely a bug. it could hit us (or someone else) if implementing a new S3-compatible service, so I wouldn't want this to get forgotten. maybe we can instead just leave a todo in the code?