Open gideonso opened 7 years ago
If this feature is added, it might be worth to add checks for Windows systems. On Windows, PHP versions < 7.1 do not support characters outside of the active locale. Starting with 7.1, everything should be fine.
+1 for this. Very important for international community))
Current behaviour has been very problematic for some (many) of our use cases. +1 from here too.
As a matter of fact I've got one project on my desk right now that needs uploaded files to remain as-is, since they are actually consumed by another tool that requires specific formatting. Currently I'm at a loss about how to implement this without "reinventing the wheel", i.e. creating my own file field.
I am desperately need this. If someone can make this, can we raise some fund for it??
I have a working solution here or let say a proof of concept. Changes to the core files are needed.
Oh. This is good news. May be you can give us the git repository that we can download and try??
Gideon
I need to do some more testing. Unfortunately, I can't create a module since a lot of methods are not hookable.
@matjazpotocnik Seems to be a perfect case for a PR.
I won't make PR as it it would not be accepted by Ryan and I understand why. You are looking for troubles if you want to support non-ascii in uploads. It's not the problem making the non-ascii filenames to get uploaded and displayed in the file list, but how would that file be stored and represented on the filesystem. I would rather leave the core files intact and make a module, but then again, some methods in the core would need to be hookable and you see how the number of feature-requests are growing here... As BitPoet said, PHP 7.1 supports UTF-8 filenames on Windows disregarding the OEM codepage, will see what that brings to the picture.
Maybe just make a PR to make needed methods hookable?
Hmm, will have to sleep over (again) and maybe go into another direction that wouldn't require so much hooking.
It's not the problem making the non-ascii filenames to get uploaded and displayed in the file list, but how would that file be stored and represented on the filesystem.
Somewhat curious: what kind of problems would this cause? Personally I'd suggest making this possible at the core level, unless there's a very strong reason not to.
Make it configurable option for all that I care, but it's such a common need for non-English folks that IMHO it shouldn't be left out.
So far the only problems I can think of seem to be a) some potential for confusion regarding case (in)sensitive file handling by the OS, and b) the general idea that by filtering input extra carefully you can avoid potential issues on the output phase.
Somewhat curious: what kind of problems would this cause?
How would you like to see the file "test_漢字汉字.txt" on the file system? As "test_漢字汉字.txt" or "test_漢ĺ—汉ĺ.txt"? The first version is created on windows using wfio, the second version is what windows do by itself. If PHP 7.1 would solve this, than we are on good path.
Personally I'd suggest making this possible at the core level, unless there's a very strong reason not to.
I agree, but you have to address Ryan for this and that's what we are doing here :-)
The first version is created on windows using wfio, the second version is what windows do by itself.
I wouldn't see weird crap like that, because I don't use Windows ;)
Jokes aside, I was admittedly a bit worried that this might be a Windows-specific issue, and turns out it is. One option would be making this a configurable option and disabled by default, with proper warnings about Windows being a major jerk in this regard. That's what I'd do, anyway.
I agree, but you have to address Ryan for this and that's what we are doing here :-)
Definitely. Was just commenting what you said above, i.e. "I would rather leave the core files intact and make a module". I wouldn't :)
"...configurable option and disabled by default, with proper warnings about" any sort of incompatibility issues that might emerge. I support this :) Core support with some "shortcomings" regarding server compatibility issues is a lot better than having nothing. If ProcessWire can support most web servers out there, then that is a pretty solid start.
One option would be making this a configurable option and disabled by default, with proper warnings about
Would this configurable option be part of input field or generic option in config.php?
about Windows being a major jerk in this regard.
I'm not linux/mac user so I can't make comments on this, but from my very limited testing, linux is not better in this regard. I suppose it has to be configured somehow to support filenames in UTF-8 (locale?)?
I'm not linux/mac user so I can't make comments on this, but from my very limited testing, linux is not better in this regard. I suppose it has to be configured somehow to support filenames in UTF-8 (locale?)?
UTF-8 filenames are supported by all standard file systems on *nix OSes. Problems usually only arise when the shell (command line) is configured to use a non-utf8 locale or old non-utf8 applications are invoked. All halfway current versions of Apache (and, importantly, also mod_rewrite) support utf8.
A check for the combination of Windows + PHP < 7.1 together with a big red warning should IMHO be a sensible approach.
Though, to keep things simple at first, just making the necessary methods hookable through a PR and putting things into a module might still be the quicker way and let Ryan sleep easier. Once there has been some successful production testing, things could be moved into the core.
UTF-8 filenames are supported by all standard file systems on *nix OSes. Problems usually only arise when the shell (command line) is configured to use a non-utf8 locale or old non-utf8 applications are invoked. All halfway current versions of Apache (and, importantly, also mod_rewrite) support utf8.
I know UTF-8 filenames are supported on *nix system, but from testing (thx tpr) I conducted, filenames are not stored in UTF-8, my guess is that you have to convince Apache+php that you would like to work with UTF-8 encoding. How you do that if you are on shared hosting and don't have access to shell to setup locale (if this is what you are talking about)? My simple test was with
file_put_contents ("Árvíztűrő tükörfúrógép.txt", "data");
And I see the file:
ĂrvĂztűrĹ‘ tĂĽkörfĂşrĂłgĂ©p.txt
The characters in Apache-generated file listings should be shown correctly if you set
AddDefaultCharset UTF-8
in http.conf or, in the .htaccess in the directory with the files,
IndexOptions +Charset=UTF-8
as document in the mod_autoindex docs.
A check for the combination of Windows + PHP < 7.1 together with a big red warning should IMHO be a sensible approach.
Technically yes, but unexpected things can still happen if the site is moved to another server etc. This would be fine as an addition, as long as there's a clear warning that's always visible :)
My simple test was with ... And I see the file ...
So far I've been unable to reproduce this, seems to work just fine on a pretty out-of-the-box Ubuntu installation at least. Are you seeing strange characters in the actual filename on the disk (via shell) or in a file listing, i.e. in a browser? If it's in a file listing, could you check the file name on the disk just to make sure? :)
I finally got hands on linux box with root shell access. I got Ubuntu 16.10 with Apache 2 and PHP 7.0. I had to tweak the configuration of apache+php to make it work, but it looks like it's working.
On my previous tests on linux, the server was not properly configured for UTF-8. One server was creating files in ISO-8859-1 encoding, the other in ISO-8859-2. While windows stores the file in UTF-16 encoding internally, it performs conversion to the configured locale, in my case Windows-1250. Uploads are working on windows too, (on IIS 8.5) PHP 7.1 is manadtory! Attached are two recordings as proof of concept.
Changes to the core files are minimal, so I think there is no need for the module. I didn't make a PR as I think Ryan might go his own route (if at all), so I have rather created a zip file with changed files (PW3.0.42) so if anyone is going to try this, just replace the core files, there is readme.txt with instructions and what is changed.
https://www.dropbox.com/s/h1by4bm8j49jo7o/Upload%20demo%20windows.gif https://www.dropbox.com/s/cuu1tg7ie83li26/Upload%20demo%20linux.gif https://www.dropbox.com/s/dduaqkd6r68m8gn/Upload%20demo.zip
Looks good. I will test it and report back.
Hi @matjazpotocnik ,
Work nicely. How about make a PR and see if @ryancramerdesign would like to make it to the core??
Gideon
There are a lot of PRs already in the queue and just making another one won't help. Ryan will decide how, when and if this will find a way to the core.
Loosely related topic on the support forum: https://processwire.com/talk/topic/18354-no-lowercase-unzipped-files/. I'm still hoping that we can one day instruct ProcessWire to just keep filenames as-is. There are legitimate use cases for that.
Ping @ryancramerdesign.
Alpha proof-of-concept module for anyone interested in exploring the idea: https://github.com/Toutouwai/FieldtypeFileUnrenamed/
@Toutouwai I install the module and it doesn't seem to have any effect. Do I missed anything??
@gideonso, maybe you didn't create a new "Files Unrenamed" field? If that's not it then sorry, I don't know. That module is just a proof-of-concept demonstration - it's not a released module that I'm providing support for I'm afraid.
@Toutouwai , it is OK. Just wrote to see if you have any idea. Let's wait for the official solution if it comes one day.
Almost 7 years since I opened this request, Still waiting for a proper solution. @ryancramerdesign any interest to make this into the core??
@gideonso I don't understand the problem (as I've never had it myself in 9 years). The description is a little short.
Is the problem that such a file can't be uploaded? Or is the problem that the file get's renamed and has a different name after upload?
If it's the former I can understand that this is annoying. If it's the latter it would be nice to give an explanation WHY this is such a big problem for you.
@BernhardBaumrock Yes, files with non-ascii characters still can be uploaded but all the non-ascii characters are replaced. Not all users are good in English. Some of them need to use Chinese characters in file name. When they upload the file to ProcessWire, all the file name become not readable. It is very unfriendly.
Optional: Screenshots/Links that demonstrate the enhancement Your screenshots/links go here.
Could you please add screenshots so that people that never ever work with non-ascii filenames can better understand how that looks like and why the problem is so prominent?
Personally I've never ever worried about PW changing filenames but it sounds like what you suggest would be helpful to others as well, so in your situation I'd try to make it as clear and obvious for others (like me or ryan) to understand (see) the problem. That would likely increase the chance of being heard.
Waiting for a solution for years is maybe not the best option you have ;) Ryan is doing all the work for free and there are often requests that simply fade away and nobody asks for them any more. It would be more than inefficient to solve old problems that are not an issue any more for whatever reason.
If your problem still persists sometimes it also helps to rephrase it. For example I've created a request once for tabulator.info and simply got rejected. A year later I made the same request but used just slightly different words and boom - the author jumped on the train, saw the benefit and bumped a completely new version with a totally new event concept (https://tabulator.info/docs/5.4/events-internal)
PS: Also I'd try to explain what you did to try to make Robins module work for you and describe exactly what is not working and why that is not a solution (as it sounds to me like it's exactly doing what you requested).
If it's the former I can understand that this is annoying. If it's the latter it would be nice to give an explanation WHY this is such a big problem for you.
Another use case is that sometimes you have files that, for one reason or another, should retain their original name. For an example back in the days I was working on a tool that bundled content into "HTML banners" that would then be uploaded to the site by the client.
In theory super easy and would've worked perfectly fine with file field — it was a nice bonus that the files were bundled in a ZIP file, which ProcessWire automatically unzipped — except that the software that our client used created files with non-ASCII characters in them, so uploading them to ProcessWire would break references here and there.
Of course we can build custom file fields / upload tools to handle anything that requires this, but it would've been pretty neat not to have to do that.
Just for the record: I'm not involved with aforementioned project anymore, and haven't had to deal with this issue in years, apart from a few requests from clients. As such I don't have a strong interest in this issue myself. Just wanted to point out that yes, this is still an active issue — and yes, it can still make things impossible or complicated for some use cases :)
I added some screen shots.
The first and second one show that I try to upload a pdf file with Chinese and English characters. After uploaded to the backend, the Chinese characters are removed.
The third and fourth one show that I try to upload a pdf file with only Chinese characters. After uploaded to the backend, the Chinese characters are removed and the file was renamed to page_resources_files.pdf.
Thx for the screenshots and clarification @gideonso
I've had a look into InputfieldFile and unless I've missed something it seems it's not easy/possible to do with hooks, but the original filename is obviously there at some point (in processInput).
I'm not sure if it would be possible (or a good idea) to support non-ascii filenames, but maybe @ryancramerdesign can save the original filename of the uploaded file to a new property of PageFile, like uploadName
or originalName
or such. Would that be a proper solution for you?
"...new property of PageFile, like uploadName or such". Oh, good idea, I never thought of that! That might work in some situations, but I guess there are other cases where the uploaded filename has to stay as is?
@gideonso I've just liked the issue, maybe that helps to draw attention to it. You could also ask people in the forum to like the issue as well if they think it is a good idea or they had problems with it themselves.
While from a safety standpoint I think we have to limit what is allowed in the filename, I like the idea @BernhardBaumrock mentioned about storing the original filename, so that it is at least available if you need it. I will add this so that you can access it from a $pagefile->uploadName()
method
@BernhardBaumrock Thanks for joining to promote this request. @ryancramerdesign How about when we need to make a link to the file in CKEditor or TinyMce?? I think it will show the modified name rather than the soon will be add uploadName? Make link in the Textarea is the real pain point for us here.
I've added this so that it now stores the original filename with the file and can be accessed from $pagefile->uploadName()
. Note that it is unsanitized so could potentially contain dangerous stuff in it, but at least it's there for those that might need it for one thing or another. I also updated InputfieldFile and InputfieldImage to display it in a tooltip.
@ryancramerdesign we had a user in the forum that reported that the uploadName
property is not available (or sanitized) when adding files via the API:
$path = "/var/www/dvmrebuild/storage/";
$fileName = "My_Test_File3$$$.pdf";
$p = $pages->get(1214);
$p->of(false);
$p->venue_files->add($path . $fileName);
$p->save('venue_files');
foreach ($p->venue_files AS $vfile) {
echo $vfile->name . ' => ' . html_entity_decode($vfile->uploadName) . '<br>';
}
# Outputs:
# my_test_file3.pdf => my_test_file3.pdf
This looks like a bug to me, no?
Here is the forum thread for reference and further details about the use case: https://processwire.com/talk/topic/28957-using-pagefile-uploadname-to-offer-downloads-of-files-with-their-original-filename/#comment-235449
Happy 7th anniversary for this issue. Still in great need for this feature. Still hoping there will be a proper solution.
@gideonso - I haven't been following it too closely, but could the new uploadName()
method be used in a hook (perhaps InputfieldFile::fileAdded
) to rename the file? Maybe it will result in issues accessing the file or interacting with it in the PW admin - not sure, but thought I'd throw it out there as an idea.
@adrianbj - Hey. Maybe worth a try. Will test it and let you know the result.
@gideonso, I've posted a tutorial to the forum that has a couple of tips for transliterating non-latin characters and for showing the original filename when linking in CKEditor or TinyMCE. Might be something helpful there? https://processwire.com/talk/topic/29273-more-tips-for-pagefile-uploadname/
@Toutouwai Wow! This is indeed helpful. At lease we can see the original name. Wonderful.
@matjazpotocnik I finally made the changes you suggested a few years ago and it still works well. Thanks for your effort. It really helps.
Short description of the enhancement
Make file name with NON ASCII characters possible. Short description goes here.
Optional: Steps that explain the enhancement
Current vs. suggested behavior
Current: All non ascii characters are stripped. Suggested: Preserve all non ascii characters
Why would the enhancement be useful to users?
For Asian users we use non ascii characters for file name. It is good to not need rename file name before upload.
Optional: Screenshots/Links that demonstrate the enhancement