Strange default for MIME type detection

mindplay-dk commented 6 years ago

Looking at the MimeType utility class, I was rather surprised to find that the MIME type detection has a hard-coded default of text/plain. (lines 53 and 65)

When implementing any kind of controller or middleware that returns a file, this causes any unknown binary file-type to open up in a text-editor!

I believe the typical default is application/octet-stream, meaning e.g. "stream of bytes" - this is typically what servers and controllers/middleware specify as the Content-Type when the file-type isn't really known, and the default behavior in browsers and clients is to simply download and save these files, making no attempt to open or display them.

I realize this is a breaking change, but there is no real work-around for client code, since we do have users uploading actual text-files, and there is no way to distinguish a default return-value of text/plain from an correctly detected return-value of text/plain for actual text-files.

Thoughts?

mindplay-dk commented 6 years ago

Uhm, well.

The Local adapter apparently doesn't do MIME-type detection, at all?

As far as I can tell, it always falls back on naive extension-based MIME-type inference, which means we can't safely use the Local adapter for user-uploaded files in the first place?

I've tried setting a breakpoints, and actual MIME-type detection is never performed.

For now, I'm avoiding File::getMimeType() entirely and writing my own MIME-type detection...

frankdejonge commented 6 years ago

@mindplay-dk can you show some code maybe? and how you've setup this test? It should check the mime-type using finfo if available. Also, this is an open source project, please keep a friendly tone of voice.

mindplay-dk commented 6 years ago

Sorry about the tone - it's been a long and frustrating day.

The project is closed-source, but at least one thing is clear to me now, regarding my second post: the Local adapter does try to do MIME-type detection after all, it just has it's own implementation in Local::getMimeType() and doesn't use the MimeType or Util helper functions except as a fallback, which is why those breakpoints told me nothing.

Both in this implementation, and in the MimeType and Util functions, it is falling back to extension-based detection though - if, for example, the fileinfo module fails to detect the type - which isn't safe for uploaded files.

And it is defaulting to text/plain rather than application/octet-stream, which I think mostly just isn't correct? In this project, I use File::getMimeType() to get the value for the Content-Type header, and if the file-type detection fails, it will incorrectly tell the client to open any file-type in a text-editor.

mindplay-dk commented 6 years ago

Taking a deeper look at implementations of ReadInterface::getMimeType(), it appears the root of the problem is that implementations don't agree on the semantics - all of the implementations return a MIME-type of course, but how this MIME-type is obtained can vary across implementations, which makes them (in terms of security and otherwise) unreliable: you'll get some MIME-type, but you can't be exactly sure how it was obtained (content vs file-extension) or whether it might be just a default, etc.

For example:

MemoryAdapter uses content first, falls back on extension
LocalAdapter uses content first, falls back on extension, but under different conditions from MemoryAdapter
FtpAdapter uses extensions only

All default to text/plain.

I find it confusing, but I think, ultimately, it makes sense for me in this project to implement my own MIME-type detection under any circumstances. The project is a media management system, and we store the files behind a FlySystem abstraction, so as to enable projects to select local/FTP/Amazon/etc storage as needed - but we store meta-data in a database (including the MIME-type) and file-type detection needs to happen prior to adding uploaded files to the media file-system anyhow.

It was convenient to let FlySystem do the MIME-type detection, but, since it wouldn't work at all with an FTP adapter, that doesn't make sense to begin with.

I guess, ultimately, the whole feature feels like a bit of a tease, to be honest:

it may be able to tell you a MIME-type,
it may return false,
it may act as though it checked the MIME-type, when it's actually just checking the file-extension, or
it may just default to text/plain which may or may not be the actual MIME-type

With so many conditions, the utility of this method is highly questionable to me. To what use-case does it apply? To begin with, due to potential security concerns, content and filename-based MIME-types have completely different use-cases, e.g. trusted vs untrusted files/users.

I'm not trying to be poignant, but I have a hard time thinking of a use-case for "get me some MIME-type maybe" - where you need a MIME-type, but it's really completely unimportant how it's obtained, whether it's correct, or whether it gets you any result at all. Is there a real use-case for that?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thephpleague / flysystem

Strange default for MIME type detection #828