liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
206 stars 39 forks source link

S3Path.from_uri does not respect urlencoded URIs #150

Closed schneidemar closed 7 months ago

schneidemar commented 9 months ago

When creating an S3Path object with a path, which contains characters not representable by URIs, the to_uri function apparently call urlencode to encode those characters. This is fine. But when using this URI to create a new instance of S3Path, it should call urldecode, to decode the URI to the correct path. This is currently not done and results in errors of S3 being not able to find the key (of course, the key is wrong.) Example

path = S3Path.from_uri("s3://bucket/test/2023-09-10T00:00:00.000Z.txt")
# S3Path('/bucket/test/2023-09-10T00:00:00.000Z.txt'), still correct

path.as_uri()
# s3://bucket/test/2023-09-10T00%3A00%3A00.000Z.txt  now urlencoded

S3Path.from_uri(path.as_uri())
# S3Path('/bucket/test/2023-09-10T00%3A00%3A00.000Z.txt')  still encoded, S3 does not find this file

Workaround is of course to manually call urldecode on the string first, but IMO S3Path.from_uri should handle this, as it also calls urlencode when creating the URI.

maresb commented 9 months ago

See #77 for context. I'm not sure if we converged on the correct solution or not.

liormizr commented 7 months ago

I merged a fix to master I'll update here after deploying new version

liormizr commented 7 months ago

New Version deployed 0.5.1 with this card fix

maresb commented 7 months ago

Seems to be working in the new version, thanks!!!