Closed saidbakr closed 4 years ago
Very good news. However, any solution should take care about Linux hidden files that their names starts with dot, such as .htaccess
, .conkyrc
, etc
It may be tricky, because extension of file.tar.xz
file is xz
, not tar.xz
: https://3v4l.org/rHEhZ
This would require changing how extension is detected - we could compare extensions list with the end of the file name, but I'm not sure how it would react with MIME types check.
UploadedFile
uses PHP's pathinfo
function which only returns the part after the last dot.
To allow multi part extensions FileValidator
would need to loop over the allowed extensions and do something like a StringHelper::endsWith
check on the full filename.
UploadedFile
uses PHP'spathinfo
function which only returns the part after the last dot.To allow multi part extensions
FileValidator
would need to loop over the allowed extensions and do something like aStringHelper::endsWith
check on the full filename.
pathinfo returns basename
in the associative array and the basename with the regex pattern like:
\.([a-z0-9\.]*)
It should return the the extension according to this demo
@rob006 We may use pathinfo without parameters too.
@rob006 We may use pathinfo without parameters too.
How would it help?
Your regex is incorrect, since it will return com.html
as extension for yiiframework.com.html
file.
@rob006 We may use pathinfo without parameters too.
How would it help? Your regex is incorrect, since it will return
com.html
as extension foryiiframework.com.html
file.
What is the problem? a file named loremLibsum.com.html
to be com.html
! Just you are able to extract the extension portion of the file base name. Basically, it is not complete implementation, it is only a partial suggestion.
A filename extension is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces. Filename extension From Wikipedia
Additionally, according to this PHP official example for pathinfo
, it has been linked to the previous comment too, with no extra parameter, i.e. just the path, it will return an associative array, one of its keys is named basename
. The most funny thing in that official example, that it uses composite or multi part file extension!
What is the problem? a file named
loremLibsum.com.html
to becom.html
!
I'm not sure how it helps. com.html
is definitely not a valid extensions of this file, so this regex cannot be used for detecting extensions. And it does not really help if we need to use StringHelper::endsWith()
to compare list of extensions with file name, since we can call it on file path directly. basename
does not help here either (BTW: there is separate basename()
method for this).
What is the problem? a file named
loremLibsum.com.html
to becom.html
!I'm not sure how it helps.
com.html
is definitely not a valid extensions of this file, so this regex cannot be used for detecting extensions. And it does not really help if we need to useStringHelper::endsWith()
to compare list of extensions with file name, since we can call it on file path directly.basename
does not help here either (BTW: there is separatebasename()
method for this).
Don't fight the definition! As regarded in the Wikipedia, the file extension is the portion that is separated by period. It is so simple. Any thing after the first dot is regarded as an extension.
From the API, [StringHelper::endsWith method](https://www.yiiframework.com/doc/api/2.0/yii-helpers-basestringhelper#endsWith()-detail) takes two required string parameters and the third is optional, so we could use it like the following:
$path = getThePathByAnyWay();
$allowedExtensionsList = getAnArrayOfAllowedExtensionsByAnyWay()
$fileBaseName = pathinfo($path)['basename'];
$matches = [];
$fileExtension = preg_match('/\.([a-z0-9\.]*)/',$fileBaseName,$matches)
//return StringHelper::endsWith($fileBaseName,$fileExtension[1])
//OR
// return in_array($fileExtension[1], $allowedExtensionsList)
Again this is closest to be pseudo code!
Don't fight the definition! As regarded in the Wikipedia, the file extension is the portion that is separated by period. It is so simple. Any thing after the first dot is regarded as an extension.
You should really read this Wikipedia article before you refer it:
example: txt
is the extension of the filename readme.txt
, and html
the extension of mysite.index.html
https://en.wikipedia.org/wiki/Filename_extension
It has also paragraph about tar.gz
and I wouldn't say that this is "simple".
So we can take all the extensions and compare them reducing the string i.e. for .tar.gz
it will be:
.tar.gz
.gz
For .tar.xz
:
.tar.xz
.xz
@samdark I don't, exactly, understand what you have suggested. However, I could understand that it should allow the developer to examine whatever extension he want. In other words, any file name extension that the server's operating system allow as a valid file name.
public function rules()
{
return [
[['imageFile'], 'file', 'skipOnEmpty' => false, 'extensions' => 'png, jpg, lib.inc, fox.box.lux'],
];
}
@rob006 You have mixed between file type or content type and file extension. In the regarded Wikipedia at the third sentences of the first paragraph, the abstract definition of the file extension. What you have regarded has no any conflict with what I have said or with this definition.
A filename extension is typically delimited from the filename with a full stop (period), but in some systems it is separated with spaces.
I think that pathinfo
deals with the file name as string data. In other words, if a plain text file has been created and the name has been changed from my.txt
to my.png
, there is no way in the PHP to determine the real content type of that file.
What you have regarded has no any conflict with what I have said or with this definition.
My interpretation has no conflict either (mysite.index
is file name and html
is extension - they're separated by dot), and it match further clarification for file names with multiple dots. You're just focusing on single ambiguous sentence (it does not say which dot, you just blindly assumed that it is first) and ignoring rest of the article.
I think that
pathinfo
deals with the file name as string data. In other words, if a plain text file has been created and the name has been changed frommy.txt
tomy.png
, there is no way in the PHP to determine the real content type of that file.
Well, there is - you can check MIME type and FileValidator
is already doing it. But there is no MIME type for tar.xz
or tar.gz
(there are MIME types for xz
and gz
), and certainly there is no MIME type for com.html
or index.html
. Introducing some fancy non-standard system for detecting extensions may conflict with current MIME type checks.
It will conflict with mime checks for sure because tar.gz
, according to mime type, is .gz
.
@rob006
"(it does not say which dot, you just blindly assumed that it is first)"
Simply the computer may assume it as well. The first occurrence of match is the default case of matching any pattern. For example, any regex without g
flag will match the first occurrence only.
@samdark In this case, a new option may be added to determine if the validation should apply mime check or not or even to check the textual extension and/or its supposed mime. In some cases, the developer need to make his own files or need to deal with with weird extensions files. For example, Checkout this download page, you will find packages to be downloaded have the extension ayt
. These files are nothing else zip files. i.e just rename any of them from .ayt to .zip and any compression application will open them well.
I've re-checked everything.
'checkExtensionByMimeType' => false
.'mimeTypes' => ['application/x-xz']
if you'll add corresponding mime type to config.pathinfo
it is what comes after last .
in the full name.In order to solve point 3, we can introduce another option to validate file name according to something like regular expression. Then you'd be able to do 'mask' => '~\.tar\.xz$~i'
.
@saidbakr Simply the computer may assume it as well.
But it's not. For simple implementations it is better to check extension after the last dot instead of first one, and I can't imagine any app which would decide that com.index
is valid extension of example.com.html
file and does not treat it as a bug.
@samdark Or we could use StringHelper::endsWith()
for comparing list of extensions with file path to check if file name match any of extensions specified in FileValidator::$extensions
- then user can use anything here (even com.html
) and it still will be simpler and foolproof than regex. For MIME validation we may use "real" extension, because this is the only sane thing to do. So for tar.gz
we should check if file name ends with .tar.gz
and for MIME validation we would check if application/gzip
type match gz
extension.
I've re-checked everything.
1. In this case we want to validate extension and mime type separately so we need to set `'checkExtensionByMimeType' => false`. 2. Mime validation would work as expected with `'mimeTypes' => ['application/x-xz']` if you'll add corresponding mime type to config. 3. As for extension, it won't work because it's not clear what to consider an extension. According PHP's `pathinfo` it is what comes after last `.` in the full name.
In order to solve point 3, we can introduce another option to validate file name according to something like regular expression. Then you'd be able to do
'mask' => '~\.tar\.xz$~i'
.
Ok, I agree with what ever any solution that allows the developer to regard what ever he/she considers as an extension.
@rob006 sounds way better than regex.
Feature request.
What steps will reproduce the problem?
Try to validate composite file extension such as
tar.xz
What is the expected result?
File name
file.bar.xz
should not be uploaded onlyfile.tar.xz
should be allowed.What do you get instead?
Any file ended with
.xz
is uploaded.Additional info