yiisoft / yii2

Yii 2: The Fast, Secure and Professional PHP Framework
http://www.yiiframework.com
BSD 3-Clause "New" or "Revised" License
14.24k stars 6.91k forks source link

Composite file extension validation is not supported #18094

Closed saidbakr closed 4 years ago

saidbakr commented 4 years ago

Feature request.

What steps will reproduce the problem?

Try to validate composite file extension such as tar.xz

What is the expected result?

File name file.bar.xz should not be uploaded only file.tar.xz should be allowed.

What do you get instead?

Any file ended with .xz is uploaded.

Additional info

Checkout this question on Stack Overflow Q A
Yii version 2.0.?
PHP version 7
Operating system
saidbakr commented 4 years ago

Very good news. However, any solution should take care about Linux hidden files that their names starts with dot, such as .htaccess, .conkyrc, etc

rob006 commented 4 years ago

It may be tricky, because extension of file.tar.xz file is xz, not tar.xz: https://3v4l.org/rHEhZ

This would require changing how extension is detected - we could compare extensions list with the end of the file name, but I'm not sure how it would react with MIME types check.

alex-code commented 4 years ago

UploadedFile uses PHP's pathinfo function which only returns the part after the last dot.

To allow multi part extensions FileValidator would need to loop over the allowed extensions and do something like a StringHelper::endsWith check on the full filename.

saidbakr commented 4 years ago

UploadedFile uses PHP's pathinfo function which only returns the part after the last dot.

To allow multi part extensions FileValidator would need to loop over the allowed extensions and do something like a StringHelper::endsWith check on the full filename.

pathinfo returns basename in the associative array and the basename with the regex pattern like: \.([a-z0-9\.]*) It should return the the extension according to this demo

@rob006 We may use pathinfo without parameters too.

rob006 commented 4 years ago

@rob006 We may use pathinfo without parameters too.

How would it help? Your regex is incorrect, since it will return com.html as extension for yiiframework.com.html file.

saidbakr commented 4 years ago

@rob006 We may use pathinfo without parameters too.

How would it help? Your regex is incorrect, since it will return com.html as extension for yiiframework.com.html file.

What is the problem? a file named loremLibsum.com.html to be com.html! Just you are able to extract the extension portion of the file base name. Basically, it is not complete implementation, it is only a partial suggestion.

A filename extension is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces. Filename extension From Wikipedia

Additionally, according to this PHP official example for pathinfo, it has been linked to the previous comment too, with no extra parameter, i.e. just the path, it will return an associative array, one of its keys is named basename. The most funny thing in that official example, that it uses composite or multi part file extension!

rob006 commented 4 years ago

What is the problem? a file named loremLibsum.com.html to be com.html!

I'm not sure how it helps. com.html is definitely not a valid extensions of this file, so this regex cannot be used for detecting extensions. And it does not really help if we need to use StringHelper::endsWith() to compare list of extensions with file name, since we can call it on file path directly. basename does not help here either (BTW: there is separate basename() method for this).

saidbakr commented 4 years ago

What is the problem? a file named loremLibsum.com.html to be com.html!

I'm not sure how it helps. com.html is definitely not a valid extensions of this file, so this regex cannot be used for detecting extensions. And it does not really help if we need to use StringHelper::endsWith() to compare list of extensions with file name, since we can call it on file path directly. basename does not help here either (BTW: there is separate basename() method for this).

Don't fight the definition! As regarded in the Wikipedia, the file extension is the portion that is separated by period. It is so simple. Any thing after the first dot is regarded as an extension.

From the API, [StringHelper::endsWith method](https://www.yiiframework.com/doc/api/2.0/yii-helpers-basestringhelper#endsWith()-detail) takes two required string parameters and the third is optional, so we could use it like the following:

$path = getThePathByAnyWay();
$allowedExtensionsList = getAnArrayOfAllowedExtensionsByAnyWay()
$fileBaseName = pathinfo($path)['basename'];
$matches = [];
$fileExtension = preg_match('/\.([a-z0-9\.]*)/',$fileBaseName,$matches)
//return StringHelper::endsWith($fileBaseName,$fileExtension[1])  
//OR
// return in_array($fileExtension[1], $allowedExtensionsList)

Again this is closest to be pseudo code!

rob006 commented 4 years ago

Don't fight the definition! As regarded in the Wikipedia, the file extension is the portion that is separated by period. It is so simple. Any thing after the first dot is regarded as an extension.

You should really read this Wikipedia article before you refer it:

example: txt is the extension of the filename readme.txt, and html the extension of mysite.index.html https://en.wikipedia.org/wiki/Filename_extension

It has also paragraph about tar.gz and I wouldn't say that this is "simple".

samdark commented 4 years ago

So we can take all the extensions and compare them reducing the string i.e. for .tar.gz it will be:

For .tar.xz:

saidbakr commented 4 years ago

@samdark I don't, exactly, understand what you have suggested. However, I could understand that it should allow the developer to examine whatever extension he want. In other words, any file name extension that the server's operating system allow as a valid file name.

public function rules()
    {
        return [
            [['imageFile'], 'file', 'skipOnEmpty' => false, 'extensions' => 'png, jpg, lib.inc, fox.box.lux'],
        ];
    }

@rob006 You have mixed between file type or content type and file extension. In the regarded Wikipedia at the third sentences of the first paragraph, the abstract definition of the file extension. What you have regarded has no any conflict with what I have said or with this definition.

A filename extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces.

I think that pathinfo deals with the file name as string data. In other words, if a plain text file has been created and the name has been changed from my.txt to my.png, there is no way in the PHP to determine the real content type of that file.

rob006 commented 4 years ago

What you have regarded has no any conflict with what I have said or with this definition.

My interpretation has no conflict either (mysite.index is file name and html is extension - they're separated by dot), and it match further clarification for file names with multiple dots. You're just focusing on single ambiguous sentence (it does not say which dot, you just blindly assumed that it is first) and ignoring rest of the article.

I think that pathinfo deals with the file name as string data. In other words, if a plain text file has been created and the name has been changed from my.txt to my.png, there is no way in the PHP to determine the real content type of that file.

Well, there is - you can check MIME type and FileValidator is already doing it. But there is no MIME type for tar.xz or tar.gz (there are MIME types for xz and gz), and certainly there is no MIME type for com.html or index.html. Introducing some fancy non-standard system for detecting extensions may conflict with current MIME type checks.

samdark commented 4 years ago

It will conflict with mime checks for sure because tar.gz, according to mime type, is .gz.

saidbakr commented 4 years ago

@rob006

"(it does not say which dot, you just blindly assumed that it is first)"

Simply the computer may assume it as well. The first occurrence of match is the default case of matching any pattern. For example, any regex without g flag will match the first occurrence only.

@samdark In this case, a new option may be added to determine if the validation should apply mime check or not or even to check the textual extension and/or its supposed mime. In some cases, the developer need to make his own files or need to deal with with weird extensions files. For example, Checkout this download page, you will find packages to be downloaded have the extension ayt. These files are nothing else zip files. i.e just rename any of them from .ayt to .zip and any compression application will open them well.

samdark commented 4 years ago

I've re-checked everything.

  1. In this case we want to validate extension and mime type separately so we need to set 'checkExtensionByMimeType' => false.
  2. Mime validation would work as expected with 'mimeTypes' => ['application/x-xz'] if you'll add corresponding mime type to config.
  3. As for extension, it won't work because it's not clear what to consider an extension. According PHP's pathinfo it is what comes after last . in the full name.

In order to solve point 3, we can introduce another option to validate file name according to something like regular expression. Then you'd be able to do 'mask' => '~\.tar\.xz$~i'.

rob006 commented 4 years ago

@saidbakr Simply the computer may assume it as well.

But it's not. For simple implementations it is better to check extension after the last dot instead of first one, and I can't imagine any app which would decide that com.index is valid extension of example.com.html file and does not treat it as a bug.

@samdark Or we could use StringHelper::endsWith() for comparing list of extensions with file path to check if file name match any of extensions specified in FileValidator::$extensions - then user can use anything here (even com.html) and it still will be simpler and foolproof than regex. For MIME validation we may use "real" extension, because this is the only sane thing to do. So for tar.gz we should check if file name ends with .tar.gz and for MIME validation we would check if application/gzip type match gz extension.

saidbakr commented 4 years ago

I've re-checked everything.

1. In this case we want to validate extension and mime type separately so we need to set `'checkExtensionByMimeType' => false`.

2. Mime validation would work as expected with `'mimeTypes' => ['application/x-xz']` if you'll add corresponding mime type to config.

3. As for extension, it won't work because it's not clear what to consider an extension. According PHP's `pathinfo` it is what comes after last `.` in the full name.

In order to solve point 3, we can introduce another option to validate file name according to something like regular expression. Then you'd be able to do 'mask' => '~\.tar\.xz$~i'.

Ok, I agree with what ever any solution that allows the developer to regard what ever he/she considers as an extension.

samdark commented 4 years ago

@rob006 sounds way better than regex.