googleapis / nodejs-storage

Node.js client for Google Cloud Storage: unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.
https://cloud.google.com/storage/
Apache License 2.0
896 stars 370 forks source link

File.publicUrl() unexpectedly escapes forward slashes in > 7.6.0 #2419

Open Jeanno opened 7 months ago

Jeanno commented 7 months ago

Summary

Forward slashes are escaped to "%2F" in File.publicUrl() for files that are > 1 level deep.

Unexpected result: "https://storage.googleapis.com/bucketId/path%2Fto%2Ffile" Expected result: "https://storage.googleapis.com/bucketId/path/to/file"

The expected result is given before <= 6.5.4. So this issue looks like a regression.

Environment details

Steps to reproduce

  1. Run the example code. This can also be reproduced in npm's runkit. (https://npm.runkit.com/%40google-cloud%2Fstorage)
var Storage = require("@google-cloud/storage")

var storage = new Storage.Storage();
var bucket = storage.bucket('bucketId');
var file = bucket.file('path/to/file');
var url = file.publicUrl();

Unexpected result: "https://storage.googleapis.com/bucketId/path%2Fto%2Ffile" Expected result: "https://storage.googleapis.com/bucketId/path/to/file"

Additional info

There was an identical bug in the python library. https://github.com/googleapis/google-cloud-python/issues/3809

ddelgrosso1 commented 7 months ago

Thanks for reporting this issue @Jeanno. This code path hasn't changed in some time so I am wondering if perhaps something changed on the node side. Did you recently change versions? Can you provide me with the version you are currently using?

h-auzian commented 7 months ago

I can confirm that I have the same problem on my end using Node 20.

I noticed that the last version without the problem is 5.18.2. From 5.18.3 onwards, the forward slashes / are shown as %2F.

== 5.18.2: https://storage.googleapis.com/my-bucket/my-folder/my-file.csv >= 5.18.3: https://storage.googleapis.com/my-bucket/my-folder%2Fmy-file.csv

Jeanno commented 7 months ago

Thanks for reporting this issue @Jeanno. This code path hasn't changed in some time so I am wondering if perhaps something changed on the node side. Did you recently change versions? Can you provide me with the version you are currently using?

I didn't change my node version (for at least a few months). The issue broken my production which I had to roll back immediately.

My development node version: node 21 Production node version: node 20. Docker image is node:20-slim to be exact.

ddelgrosso1 commented 7 months ago

I can confirm that I have the same problem on my end using Node 20.

I noticed that the last version without the problem is 5.18.2. From 5.18.3 onwards, the forward slashes / are shown as %2F.

== 5.18.2: https://storage.googleapis.com/my-bucket/my-folder/my-file.csv >= 5.18.3: https://storage.googleapis.com/my-bucket/my-folder%2Fmy-file.csv

Excellent thank you @HAuzian this will help me track down what happened.

ddelgrosso1 commented 7 months ago

See my original note https://github.com/googleapis/nodejs-storage/issues/1824#issuecomment-1091921411. Is this encoding causing an issue or is it just a readability problem as noted in this issue?

h-auzian commented 7 months ago

@ddelgrosso1, thanks for linking the note providing context. The change makes perfect sense.

At least in my case, the result from File.publicUrl() was .split("/") for further processing, and that was the part that broke, but it seems to be an easy fix on my end :+1:

Jeanno commented 7 months ago

See my original note #1824 (comment). Is this encoding causing an issue or is it just a readability problem as noted in this issue?

It took down my services in production as URLs from GCS are split by "/". I wrote my workaround so it's fine for me for now.

I'm sure there will be a lot of similar cases out there so when they upgrade their dependencies it will again cause some outages.

ddelgrosso1 commented 6 months ago

Being as this appears to be working as intended. Going to close this issue out. If there are any other questions or concerns, please feel free to reopen / open a new issue.

Jeanno commented 6 months ago

Hi @ddelgrosso1 thanks for taking a look at this. But I still think the resolution contradicts with the object naming guideline that you referenced to in the other issue.

For example, if you create an object named folder1/file.txt in the bucket your-bucket, the path to the object is your-bucket/folder1/file.txt, and Cloud Storage has no folder named folder1 stored within it. From the perspective of Cloud Storage, the string folder1/ is part of the object's name.

And specifically,

The / character [...] is typically OK to use in object names for mimicking a directory structure

On the other hand, the public urls shown on Google Cloud Console are https://storage.googleapis.com/BUCKET_ID/path/to/file instead of having encoded slashes.

Given the above, I think the change does not make sense to me. Perhaps I'm missing something here but would appreciate if someone could point that out for me.

ddelgrosso1 commented 6 months ago

Let me take another look and see if I can improve on what was done.

ddelgrosso1 commented 6 months ago

Just some notes from poking around at this a bit, as noted the library currently encodes / which it should not be doing. However, / isn't a valid value in the name piece of a file (NOT the pathing to the file, i.e. you can't name a file sample/test.jpg). We currently leverage encodeURIComponent on the entire name which includes the folder / pathing information.

To fix this properly we may need to split the name based on pathing characters, encode the individual pieces (folders / paths can contain encodable characters), and rejoin the encoded pieces using the pathing character. I also need to check if we would want to hold off on this until the next major version update as it would likely be breaking (although one could argue it is a bug fix).

Will update accordingly.