octobercms / october

Self-hosted CMS platform based on the Laravel PHP Framework.
https://octobercms.com/
Other
11.01k stars 2.22k forks source link

MediaManager Can not upload filename with Unicode character #3354

Closed buuug7 closed 4 years ago

buuug7 commented 6 years ago
Description

When upload filename with Unicode characters, MediaManager worked very bizarre. For example, we upload a file with 波比.jpg name, the MediaMange prompt upload success, but we could not found it in the MediaManage. actually, it upload 波比.jpg and rename it with .jpg name, only contain the the dot and file extension.

Screenshot

media1 media2 media3

suggestion

use the slug-generator library for generate the filename with URL Slug, the library will try to convert characters to ASCII, and then save to the filesystem. this will solve mutiple laguage named filename uploaded problems.

// example for use SlugGenerator
$generator = new SlugGenerator;

$generator->generate('Hello Wörld!');  // Output: hello-world
$generator->generate('Καλημέρα');      // Output: kalemera
$generator->generate('фильм');         // Output: film
$generator->generate('富士山');         // Output: fu-shi-shan
$generator->generate('國語');           // Output: guo-yu

// Different valid character set, a specified locale and a delimiter
$generator = new SlugGenerator((new SlugOptions)
    ->setValidChars('a-zA-Z0-9')
    ->setLocale('de')
    ->setDelimiter('_')
);
$generator->generate('Äpfel und Bäume');  // Aepfel_und_Baeume

my solution

Modified the Backend\Widgets\MediaManager checkUploadPostback() method as below:

media4

result

media5

LukeTowers commented 6 years ago

Duplicate of #2603 but this is a better issue report so any discussion will be here instead

LukeTowers commented 6 years ago

@buuug7 while your suggestion of the URL generator is appreciated, I want to add extra dependencies only as a last resort. How does Laravel handle unicode file name issues? What can we do with the dependencies that we already have available to us? What are the concerns with saving non-latin characters to the filesystem?

Eoler commented 6 years ago

Some of (not so recent) browsers on very popular OS still have trouble displaying images with spaces in URIs so it would be great to at least slugify filenames for uploaded Media Manager files - Laravel's built-in Str::slug works fine for that purpose with all Latin languages.

buuug7 commented 6 years ago

@Eoler agree

LukeTowers commented 6 years ago

@Eoler @buuug7 what about for non-latin characters? I'm fine with the slugifying, would prefer a solution that keeps the unicode characters present if possible without adding extra dependencies

Eoler commented 6 years ago

Don't know how are non-latin characters translated to slugs, maybe @buuug7 can elaborate. Also, not sure I really wan't my CMS users to upload media files with Unicode names...

buuug7 commented 6 years ago

@LukeTowers @Eoler , Unicode contains more characters than ASCII and Latin, it represent future. For example, Utf8 is the most widely used on the Internet a Unicode implementation approach, which is designed for transmission codes, and makes the code without borders, so you can show that the all cultural characters in the world.

buuug7 commented 6 years ago

@LukeTowers Laravel Str::slug only works with Latin laguage. For brevity, it does not provide additional Unicode support. if a developer needed Unicde support unless he find a solution. this keep laravel framework lighter weight.

buuug7 commented 6 years ago

@LukeTowers There is no need to implementation with october. there is no need reinvent the wheel. Besides, it takes extra effort to maintain this feature, and I think it's necessary to use the third-party packages.

LukeTowers commented 6 years ago

@buuug7 Str::ascii() exists in Laravel, can we use that instead?

buuug7 commented 6 years ago

@LukeTowers tested, seems it not work.

        Str::ascii('Hello Wörld!');  // "Hello World!"
        Str::ascii('Καλημέρα');  // "Kalhmera"
        Str::ascii('фильм');  // "film"
        Str::ascii('富士山');  // ""
        Str::ascii('富士山', 'zh-CN');  // ""
        Str::ascii('國語');  // ""
        Str::ascii('國語','zh-TW'); //  ""
LukeTowers commented 6 years ago

@buuug7 is the dependency you suggested going to introduce new system requirements for October? https://github.com/ausi/slug-generator/blob/master/composer.json? They require:

"ext-intl": "*",
"ext-mbstring": "*",
"ext-pcre": "*"

Also, would it make more sense to load this dependency into the October Library and then utilize it within October's Str::slug method?

buuug7 commented 6 years ago

@LukeTowers yes, load this denpendency into October Library and packaged with Str::slug method is very good. i try it.

SadiqUltra commented 5 years ago

In Backend\Widgets\MediaManager.php file I have change regex with desired Unicode range. And it does work.

protected function validateFileName($name)
    {
        // added our language's unicode in regex
        if (!preg_match('/^[\p{Bengali}0-9a-z@\.\s_\-]+$/i', $name)) {
            return false;
        }

        if (strpos($name, '..') !== false) {
            return false;
        }

        return true;
    }
LukeTowers commented 5 years ago

@SadiqUltra you shouldn't change core files, those changes will be removed next time you update.

SadiqUltra commented 5 years ago

@LukeTowers yes, but for now its only option that works, I guess.

Johnzero commented 5 years ago

@buuug7 did you fix it?

chocolata commented 5 years ago

Any update on this? Some other plugins rely on the paths being a-zA-Z0-9...

LukeTowers commented 5 years ago

@maartenmachiels nope. Feel free to propose a change to the regex if you wish, just keep in mind that it won't be accepted unless it solves the original problem (illegal characters allowing for XSS attacks or directory traversal) and works for all of October's user base.

github-actions[bot] commented 4 years ago

This issue will be closed and archived in 3 days, as there has been no activity in the last 30 days. If this issue is still relevant or you would like to see action on it, please respond and we will get the ball rolling.

mjauvin commented 4 years ago

I just tested uploading jpg files with UTF-8 encoded filenames and it works fine on my setup.

Here's the Media Library after the upload

Screenshot from 2019-11-12 12-22-20

mjauvin commented 4 years ago

I suspect the problem might be with the Characterset used by the user's Browser that may lead to the problem.

I used a different browser on my system and was able to replicate the issue... so browser configuration definitely has a role... more data to come

mjauvin commented 4 years ago

With Firefox, this setting makes it work: intl.fallbackCharsetList.ISO-8859-1 = windows-1252 With Chrome, this extension is needed: https://www.deep-watch.net/en/blog/bring-character-encoding-menu-back-to-chrome-with-extension Actual link for the extension: https://chrome.google.com/webstore/detail/set-character-encoding/bpojelgakakmcfmjfilgdlmhefphglae

LukeTowers commented 4 years ago

@mjauvin is there anything that can be done serverside to fix this? Detect the browser's encoding? Tell the browser what encoding to use? What sort of options do we have?

mjauvin commented 4 years ago

That's what I'm trying to find out, so far only client-side changes do the trick.

AFAIK, we already tell the browser which encoding to use in the form tag using: accept-charset="UTF-8" (ref. vendor/october/rain/src/Html/FormBuilder.php)

And also the <meta charset="utf-8"> in the meta tags (ref. modules/backend/layouts/_head.htm)

But there is something I am missing

mjauvin commented 4 years ago

What's really odd is that I cannot reproduce at will. I don't know if there is some caching involved or not, but even without changing client-side charset encoding, I SOMETIME get the empty file result.

github-actions[bot] commented 4 years ago

This issue will be closed and archived in 3 days, as there has been no activity in the last 30 days.

github-actions[bot] commented 4 years ago

This issue will be closed and archived in 3 days, as there has been no activity in the last 30 days.