node-formidable / formidable

The most used, flexible, fast and streaming parser for multipart form data. Supports uploading to serverless environments, AWS S3, Azure, GCP or the filesystem. Used in production.
MIT License
7.05k stars 682 forks source link

Formidable detects MIME-type according to the file extension and not by the real content #749

Open pubmikeb opened 3 years ago

pubmikeb commented 3 years ago

Support plan

Context

BTW, multer detects file's MIME not by the extension only.

What are you trying to achieve or the steps to reproduce?

  1. Given JPG file tst.jpg
  2. Rename it to tst.pdf
  3. Set a break point inside of uploader.parse(req, async (err, fields, files) => {…}
  4. Try to upload tst.pdf
uploader.parse(req, async (err, fields, files) => {
    if (err) {
        reject(err);
    } else {…}
});

What was the result you got?

mimetype = application/pdf

11_010801

What result did you expect?

mimetype = image/jpeg Since this file is actually JPG file but with a from extension.

GrosSacASac commented 3 years ago

That is good point, and the readme could definitely have a small guide on how to check this.

I don't think we will ever bake in content-based MIME detection since there are thousands of file types (and more created each year) and each one needs special detection. For example to know if it is a jpg is very different algorithm than zip, pdf etc.

pubmikeb commented 3 years ago

since there are thousands of file types (and more created each year) and each one needs special detection

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

GrosSacASac commented 3 years ago

multer detects file's MIME not by the extension only.

How does multer do it ?

GrosSacASac commented 3 years ago

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

Common types is very subjective.

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example https://github.com/node-formidable/formidable/issues/718#issuecomment-873079196

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

pubmikeb commented 3 years ago

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

Sure, ideally would be to make formidable-core for the minimalistic and generic as much as possible functionality and a set of plugins e.g. formidable-pdf, formidable-msoffice, etc. So every project could select the required set of supported types and every developer could enrich a set of supported file types by writing a plugin. Something similar to ESLint's set of rules.

pubmikeb commented 3 years ago

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example #718 (comment)

Great approach, zero dependencies and 85 KB package are much better then alternative solutions with 2-5 MB of dependencies.

pubmikeb commented 3 years ago

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
    console.log(await FileType.fromFile("Unicorn.png"));
    console.log(await FileType.fromBuffer(fileBuffer));
    console.log(await FileType.fromStream(fileStream));

    //Output: {ext: "png", mime: "image/png'"}
})();
TheThing commented 3 years ago

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
  console.log(await FileType.fromFile("Unicorn.png"));
  console.log(await FileType.fromBuffer(fileBuffer));
  console.log(await FileType.fromStream(fileStream));

  //Output: {ext: "png", mime: "image/png'"}
})();

Except file-type adds not a single dependency but twelve. Some of which are redundant at this point:

`-- file-type@16.5.1
  +-- readable-web-to-node-stream@3.0.2
  | `-- readable-stream@3.6.0
  |   +-- inherits@2.0.4
  |   +-- string_decoder@1.3.0
  |   | `-- safe-buffer@5.2.1
  |   `-- util-deprecate@1.0.2
  +-- strtok3@6.1.3
  | +-- @tokenizer/token@0.1.1
  | `-- peek-readable@3.1.4
  `-- token-types@2.1.1
    +-- @tokenizer/token@0.1.1 deduped
    `-- ieee754@1.2.1

And formidable v3 already has plugin support so if you really want file type detection magic, just make a plugin for that. I just usually don't even bother. If people wanna upload jpeg as pdf, that's their fault.

pubmikeb commented 3 years ago

If people wanna upload jpeg as pdf, that's their fault.

OK, since such functionality is not must have and it takes to implement about 10 minutes, this issue can be closed.

tunnckoCore commented 3 years ago

Leaning more and more towards a separate package like utils or plugins. where we can include such helpers for common things, just like we started with separate examples in the examples dir.

There's never ending discussion between small and big packages. And I'm still bouncing between the two. Okay, small, almost no dependencies package.. but end users anyway will include a lot of deps to do what their needs are.. so.. that's why i don't see big problem of including things like some of the two qs.. or other basic things like correct basic mime handling, here in the core package. Convenience and usability, over impracticality. Small packages / unix philosophy starts to break when you realize that you really need few things to have some meaningful and functional final product, with common features. I'm not saying I'm okey with the mentioned problem how much v2 started to grow.

I'm for staying thin and this to remain in userland with some guides, but it also might be a good thing to include some basic detection for most common types, and not something huge and very generic like file-type.

pubmikeb commented 3 years ago

I think there is no right or wrong answer here. I would take a decision in a more sophisticated way, based not solely on amount of dependencies, but on a combination of the following parameters:

@tunnckoCore, should I reopen this issue?

tunnckoCore commented 3 years ago

Agree.

should I reopen this issue?

I don't know. I'm not very active last half year or more. Maybe we can open it if we decide to implement some basic detection.

GrosSacASac commented 3 years ago

If nobody wants to work on this we might as well close the issue

tunnckoCore commented 2 years ago

I'll work on it in some time.

Made some research:

Can make @formidable/helper-mimetypes or @formidable/mimetype or formidable-mimetype, plugin or helper that uses magic-bytes or magic-bytes.js

GrosSacASac commented 2 years ago

https://github.com/vader-sama/typective similar to magic-bytes but works with streams

Also remember that a valid number does not prove anything at the end, I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

tunnckoCore commented 2 years ago

I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

Oh yea.. :laughing: I remember that too.

typective

yea, seems good.

tunnckoCore commented 2 years ago

Another one: https://github.com/lukeed/mrmime

Bessonov commented 1 year ago

I'm happy user of mmmagic. It was chosen ~6 years ago for execution inside aws lambda because of speed. But for formidable I would suggest to make it plugin-able. I mean not a specific plugin, but just a parameter like:

const form = formidable({
  getMimeType(file: Buffer): string {
    // custom logic
  }
})

Or plugin which allows custom logic.