umbraco / Umbraco.StorageProviders

MIT License
30 stars 22 forks source link

filename with hebrew letters are not shown in umbraco #45

Closed Yuval-Amidror closed 1 year ago

Yuval-Amidror commented 2 years ago

Hi,

When the filename is with Hebrew letters, we are receiving 404 from the blob storage because of double encoding URL. meaning: original URL was: /media/יובל.png which encoded in the browser to : %2Fmedia%2F%D7%99%D7%95%D7%91%D7%9C.png but the URL requested from the blob storage is %252Fmedia%252F%25D7%2599%25D7%2595%25D7%2591%25D7%259C.png

Any help appreciated.

ronaldbarendse commented 2 years ago

Hi @Yuval-Amidror! I'll have to look into this, but it might be an Azure Blob Storage limitation, as it doesn't seem to support all characters, even when correctly encoded: https://docs.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#resource-names.

Can you verify whether the file is correctly uploaded to your storage container by directly connecting to it (e.g. using Azure Storage Explorer or the Azure portal)? And does it work when you use any of the decoded URLs:

Yuval-Amidror commented 2 years ago

Hi @ronaldbarendse ,

In the Azure Blob storage, the file uploaded correctly, meaning the filename is in Hebrew letters. the problem is only in the retrieving part: in AzureBlobFileProvider.cs at GetFileInfo function (line 63) - the function receive subpath which is already URL encoded.

Thank you in advance.

alin-rautoiu commented 2 years ago

I think I have a similar problem with diacritics. Umbraco doesn't serve files containing characters such as ă ț ș î etc. They do exist in the Azure Blob and I can access them directly (.blob.core.windows.net/media/media/), but not through Umbraco ([website]/media/*).

talhamalik4025 commented 2 years ago

Hi, We are also experiencing the same issue where having non-English filename and also having spaces in the filename returns 404 even though the file exists in the azure blob storage.

JoseMarcenaro commented 2 years ago

Hi, same problem here, for any file with accented chars in its name - i.e. nación.png Note that using local storage the same file / URL is retrieved correctly. The 404 only happens on Azure storage.

/media/gvtocuxw/nación.png

The same 404 happens using the encoded name, i.e. naci%C3%B3n.png

sysiler commented 2 years ago

This is definitely not a problem of Azure Blob Storage, as it works in our current environment using Umbraco 8. I copied the blob for our migration to Umbraco 10, and all (thankfully these are not too many) files with special chars respond with a 404. Just an example: https://www.dansk.de/media/gnmelhte/venner-på-stranden-ved-ærøskøbing-ærø_29164-large.jpg?anchor=center&mode=crop&width=576&height=320&rnd=132579723304100000 That file works, but not the same file on Umbraco 10.0.1 (other files work as expected).

Let me know if I can be of any further assistance to this issue. I'd be happy to see this being solved.

JoseMarcenaro commented 2 years ago

@sysiler sorry to contradict you but it looks to me that it's an issue in Azure.Storage.Blobs, the library used in .NET 5+.

That is an entirely different project from the legacy Microsoft Azure Storage used in Umbraco v8

I'm trying to narrow down the issue in order to post it in the Azure.Storage.Blobs Github project. I will post any new findings here as well.

JoseMarcenaro commented 2 years ago

Hi @ronaldbarendse Great work with this library!

I found what may be the issue with these international characters.

Maybe the code inside Azure.Storage.Blobs should be fixed... but the SDK repo is huge and not easy to contribute.

I found a closer solution: sending a not-encoded path inside your AzureBlobProvider.cs class, and that fixed the issue for me. You might find a better way to achieve the same effect, but meanwhile I just submitted PR #51 with my workaround.

Thanks!

JoseMarcenaro commented 2 years ago

Additional info: In order to understand it this is an Azure.Storage.Blobs issue or an Umbraco.StorageProviders issue, I build a small test app to save and retrieve some files using the Azure.Storage.Blobs.BlobClient class directly (no umbraco involved) and it worked correctly for filenames with diacritics - both save and retrieval. The filename was sent without any URL encoding - i.e. "avión.jpg" - just as it shows on the Microsoft Azure Storage Explorer.

So it looks like it is correct to remove (or avoid adding) the URL encoding before trying to access a file in the Umbraco.StorageProviders.AzureBlob provider.

ronaldbarendse commented 1 year ago

I'll close this issue, as it's fixed in the CMS: https://github.com/umbraco/Umbraco-CMS/pull/12132.

JoseMarcenaro commented 1 year ago

Thanks @ronaldbarendse , I can confirm it worked for me when switching to Umbraco v10.3

@sysiler this should work for you too!