Closed mindplay-dk closed 5 years ago
Uhm, well.
The Local
adapter apparently doesn't do MIME-type detection, at all?
As far as I can tell, it always falls back on naive extension-based MIME-type inference, which means we can't safely use the Local
adapter for user-uploaded files in the first place?
I've tried setting a breakpoints, and actual MIME-type detection is never performed.
For now, I'm avoiding File::getMimeType()
entirely and writing my own MIME-type detection...
@mindplay-dk can you show some code maybe? and how you've setup this test? It should check the mime-type using finfo
if available. Also, this is an open source project, please keep a friendly tone of voice.
Sorry about the tone - it's been a long and frustrating day.
The project is closed-source, but at least one thing is clear to me now, regarding my second post: the Local
adapter does try to do MIME-type detection after all, it just has it's own implementation in Local::getMimeType()
and doesn't use the MimeType
or Util
helper functions except as a fallback, which is why those breakpoints told me nothing.
Both in this implementation, and in the MimeType
and Util
functions, it is falling back to extension-based detection though - if, for example, the fileinfo module fails to detect the type - which isn't safe for uploaded files.
And it is defaulting to text/plain
rather than application/octet-stream
, which I think mostly just isn't correct? In this project, I use File::getMimeType()
to get the value for the Content-Type
header, and if the file-type detection fails, it will incorrectly tell the client to open any file-type in a text-editor.
Taking a deeper look at implementations of ReadInterface::getMimeType()
, it appears the root of the problem is that implementations don't agree on the semantics - all of the implementations return a MIME-type of course, but how this MIME-type is obtained can vary across implementations, which makes them (in terms of security and otherwise) unreliable: you'll get some MIME-type, but you can't be exactly sure how it was obtained (content vs file-extension) or whether it might be just a default, etc.
For example:
MemoryAdapter
uses content first, falls back on extensionLocalAdapter
uses content first, falls back on extension, but under different conditions from MemoryAdapter
FtpAdapter
uses extensions onlyAll default to text/plain
.
I find it confusing, but I think, ultimately, it makes sense for me in this project to implement my own MIME-type detection under any circumstances. The project is a media management system, and we store the files behind a FlySystem abstraction, so as to enable projects to select local/FTP/Amazon/etc storage as needed - but we store meta-data in a database (including the MIME-type) and file-type detection needs to happen prior to adding uploaded files to the media file-system anyhow.
It was convenient to let FlySystem do the MIME-type detection, but, since it wouldn't work at all with an FTP adapter, that doesn't make sense to begin with.
I guess, ultimately, the whole feature feels like a bit of a tease, to be honest:
false
,text/plain
which may or may not be the actual MIME-typeWith so many conditions, the utility of this method is highly questionable to me. To what use-case does it apply? To begin with, due to potential security concerns, content and filename-based MIME-types have completely different use-cases, e.g. trusted vs untrusted files/users.
I'm not trying to be poignant, but I have a hard time thinking of a use-case for "get me some MIME-type maybe" - where you need a MIME-type, but it's really completely unimportant how it's obtained, whether it's correct, or whether it gets you any result at all. Is there a real use-case for that?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Looking at the
MimeType
utility class, I was rather surprised to find that the MIME type detection has a hard-coded default oftext/plain
. (lines 53 and 65)When implementing any kind of controller or middleware that returns a file, this causes any unknown binary file-type to open up in a text-editor!
I believe the typical default is
application/octet-stream
, meaning e.g. "stream of bytes" - this is typically what servers and controllers/middleware specify as theContent-Type
when the file-type isn't really known, and the default behavior in browsers and clients is to simply download and save these files, making no attempt to open or display them.I realize this is a breaking change, but there is no real work-around for client code, since we do have users uploading actual text-files, and there is no way to distinguish a default return-value of
text/plain
from an correctly detected return-value oftext/plain
for actual text-files.Thoughts?