kogosoftwarellc / open-api

A Monorepo of various packages to power OpenAPI in node
MIT License
894 stars 235 forks source link

fails to handle 'multipart/form-data' in openapi v3 (Unsupported Content-Type multipart/form-data; boundary=...) #279

Closed mbsimonovic closed 5 years ago

mbsimonovic commented 5 years ago

i'm getting an error trying to handle file uploads with v3.6.0. The spec is:

openapi: "3.0.0"
servers:
    - url: http://localhost:3000
      description: local server

paths:
    /upload/:
        post:
            summary: Create NEW files
            operationId: upload
            requestBody:
                required: true
                content:
                    multipart/form-data:
                        schema:
                            type: object
                            properties:
                                docs:
                                    type: array
                                    items:
                                        type: string
                                        format: binary
# tried this as well but also doesnt work:
                        # schema:
                        #   type: object
                        #   properties:
                        #     upload:
                        #       type: string
                        #       format: binary

express-openapi is initialised as suggested:

 'multipart/form-data'(req, res, next) {
            multer().any()(req, res, (err) => {
                if (err) return next(err);
                req.files.forEach(f => req.body[f.fieldname] = f);
                return next(); // this executes correctly
            });
        }

Hitting the app with curl or swagger-ui: curl "http://localhost:3000/upload/" -H "accept: */*" -H "Content-Type: multipart/form-data" -F docs=@package.json fails with the following error (the endpoint never gets called):

Unsupported Content-Type multipart/form-data; boundary=----WebKitFormBoundaryWyK9kAU7d35AKf26

The same happens with superagent/supertest and postman. Any hints?

mbsimonovic commented 5 years ago

function getSchemaForMediaType(contentType:'multipart/form-data; boundary=--------------------------480295105673604652952056', requestBodySpec='content: multipart/form-data') { seems to be the problem:

screen shot 2018-12-31 at 09 46 54
jsdevel commented 5 years ago

@mbsimonovic please submit a PR to resolve this issue.

mbsimonovic commented 5 years ago

@jsdevel not sure now to setup and run tests, but the simplest fix could look like this (line 313):

if (contentType.startsWith('multipart/form-data') 
    && mediaTypeKey.startsWith('multipart/form-data')) {
    return mediaTypeKey;
}
mbsimonovic commented 5 years ago

another option is to use a module, https://github.com/jshttp/content-type for example:

const contentType = require('content-type');

contentType.parse('multipart/form-data; boundary=----WebKitFormBoundaryWyK9kAU7d35AKf26').type == 'multipart/form-data';

contentType.parse('multipart/form-data').type == 'multipart/form-data';

contentType.parse('image/svg').type == 'image/svg';

contentType.parse('text/html; charset=utf-8; foo=bar
').type == 'text/html';
jsdevel commented 5 years ago

@mbsimonovic see the README for development help

mbsimonovic commented 5 years ago

btw shouldn't this work in typescript when importing an untyped js module:

import contentTypeParser from 'content-type'; //doesn't work
// const contentTypeParser = require('content-type'); // works fine 
jsdevel commented 5 years ago

Depends on the module you want to import. Does it export default es6 syntax?

mendelk commented 5 years ago

I seem to be running into issues with OpenApi3 with multipart/form-data. I created an express-openapi test (that current fails) here: https://github.com/mendelk/open-api/tree/openapi3-multipart-form-consumes-middleware.

Before I create an issue, can someone confirm that this is actually a bug, or maybe I'm doing something wrong?

@jsdevel @mbsimonovic

Thank you!

jsdevel commented 5 years ago

i'll defer to @mbsimonovic

mbsimonovic commented 5 years ago

Where and why exactly does it fail? The test just says bad request. We need logging!! :)

mbsimonovic commented 5 years ago

the error says:

{
message: "cannot POST /v3/consumes (400)",
type: "text/html",
text: `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>[object Object]</pre>
</body>
</html>
`

ideally i would now just turn on the debug level for openapi and see what's going on.

mendelk commented 5 years ago

@mbsimonovic I'll paste the exact error once I get to my desk, but a 400 error is a validation error. If you log the error (say, using middleware) you'll see the error is due to the spec expecting a string, and instead an object (representing the file) is passed. The bug is that a string of format binary should validate against the multer object, correct?

Thanks @mbsimonovic for your help with this!

mendelk commented 5 years ago

(As an aside, the <pre>[object Object]</pre> also gets logged to stdout and seems to be a bug somewhere...)

mbsimonovic commented 5 years ago

i somehow got multipart working for me, so perhaps i could strip it down and give it to you so you have a working version?

mendelk commented 5 years ago

That would be great! (My example is pretty stripped down, but a working starting point would be awesome.)

mbsimonovic commented 5 years ago

here you go. so npm install first, then npm run test.

openapi-multipart-demo.zip

mendelk commented 5 years ago

Thanks @mbsimonovic .

In your example, changing the spec like so:

multipart/form-data:
    schema:
        required:
            - files
        type: object
        properties:
            files:
                type: array
                items:
                    type: string
                    format: binary

(Note the required property.)

This causes the following error:

errors: [
    {
      "path": "files",
      "errorCode": "required.openapi.validation",
      "message": "should have required property 'files'",
      "location": "body"
    }
  ]

This, I believe, is the crux of our issue!

Edit, to explain further: The reason your spec is validating is because the files property is in fact not being validated, because it's an optional property. From the validator's point of view, that property doesn't exist to validate. Making it a required field changes all that. Now you'll need to put files on the req.body before x-express-openapi-additional-middleware, i.e. via args.consumesMiddleware. Once that's done, you'll get the same error I've been getting all along:

  errors: [
    {
      "path": "files[0]",
      "errorCode": "type.openapi.validation",
      "message": "should be string",
      "location": "body"
    }
  ]
mendelk commented 5 years ago

After spending some time on this, this would require quite a few changes to get working, and I'm not sure it's in scope of this library. Other libraries handle this issue by skipping validation on format: binary, but not sure that's any better. Ultimately, the problem can be solved by setting body[files][0] to an arbitrary (or empty) string, and getting the file via req.files. Is it the prettiest solution? I don't think so. But it works...

jsdevel commented 5 years ago

@mendelk can you open a PR to update the applicable README?

mendelk commented 5 years ago

@jsdevel Sure! But as I said, not sure my solution is Right Way™ to do this. If you still want me to, I'll be happy make it.

jsdevel commented 5 years ago

@mendelk i feel a clear example of how to get files working would be beneficial. it doesn't have to be branded as the right/only way to handle it, but A Way ™

mbsimonovic commented 5 years ago

i've got 2 tests failing on the master at the moment:

  1) openapi-request-validator
       should fail bad media type for request body:
     TypeError: invalid media type
      at Object.parse (node_modules/content-type/index.js:126:11)
      at getSchemaForMediaType (index.ts:2:79)
      at OpenAPIRequestValidator.validate (index.ts:1:28)
      at Context.<anonymous> (test/data-driven.ts:22:31)

  2) openapi-request-validator
       should handle missing content type as missing body:
     TypeError: argument string is required
      at Object.parse (node_modules/content-type/index.js:108:11)
      at getSchemaForMediaType (index.ts:2:79)
      at OpenAPIRequestValidator.validate (index.ts:1:28)
      at Context.<anonymous> (test/data-driven.ts:22:31)
mbsimonovic commented 5 years ago

If i understand this correctly, after multer parses the request, it attaches files on req which we should put on the body before validation runs:

req.body.files = [{
        "fieldname": "files",
        "originalname": "stock.jpeg",
        "encoding": "7bit",
        "mimetype": "image/jpeg",
        "destination": "/tmp/uploads",
        "filename": "8456802f71781abe34a1da8cca4f8244",
        "path": "/tmp/uploads/8456802f71781abe34a1da8cca4f8244",
        "size": 100367
      }];

The apidoc says that files array should have 2 props called type and format :

              files: {
                type: 'array',
                items: { type: 'string', format: 'binary' }
              }

but there're no such properties, so that's why it fails?

Btw the proposed solution:

    multer.any()(req, res, (err) => {
        if (err) {
            return next(err);
        }
        // Handle both single and multiple files
        const filesMap = req.files.reduce(
            (acc, f) =>
                Object.assign(acc, {
                    [f.fieldname]: (acc[f.fieldname] || []).concat(f)
                }),
            {}
        );
        Object.keys(filesMap).forEach((fieldname) => {
            const files = filesMap[fieldname];
            req.body[fieldname] = files.length > 1 ? files.map(() => '') : ''; // BUG
        });
        return next();
    });

Doesn't validate when schema defines files as an array, but only one file is uploaded. files.length is 1 then so req.body.files == '', but the schema says it should be an array.

mbsimonovic commented 5 years ago

nah, seems like this isn't the case, files[0] should be a string makes no sense. i've opened a PR that nicely reproduces this in request validator https://github.com/kogosoftwarellc/open-api/pull/329

mbsimonovic commented 5 years ago

here's how the generated validator looks like:

function validateBody(data, dataPath, parentData, parentDataProperty, rootData) {
  'use strict';
  var vErrors = null;
  var errors = 0;
  if ((data && typeof data === 'object' && !Array.isArray(data))) {
    var errs__0 = errors;
    var valid1 = true;
    var data1 = data.body;
    if (data1 !== undefined) {
      var errs_1 = errors;
      if ((data1 && typeof data1 === 'object' && !Array.isArray(data1))) {
        var errs__1 = errors;
        var valid2 = true;
        var data2 = data1.files;
        if (data2 === undefined) {
          valid2 = false;
          var err = {
            keyword: 'required',
            dataPath: (dataPath || '') + '.body',
            schemaPath: '#/properties/body/required',
            params: { missingProperty: 'files' },
            message: 'should have required property \'files\''
          };
          if (vErrors === null) vErrors = [err]; else vErrors.push(err);
          errors++;
        } else {
          var errs_2 = errors;
          if (Array.isArray(data2)) {
            var errs__2 = errors;
            var valid2;
            for (var i2 = 0; i2 < data2.length; i2++) {
              var errs_3 = errors;
              if (!(typeof data2[i2] === 'string')) {  // FAILS HERE: data2[i2] (body.files) is an array
                var err = {
                  keyword: 'type',
                  dataPath: (dataPath || '') + '.body.files[' + i2 + ']',
                  schemaPath: '#/properties/body/properties/files/items/type',
                  params: { type: 'string' },
                  message: 'should be string'
                };
                if (vErrors === null) vErrors = [err]; else vErrors.push(err);
                errors++;
              }
              var valid3 = errors === errs_3;
            }
          } else {
            var err = {
              keyword: 'type',
              dataPath: (dataPath || '') + '.body.files',
              schemaPath: '#/properties/body/properties/files/type',
              params: { type: 'array' },
              message: 'should be array'
            };
            if (vErrors === null) vErrors = [err]; else vErrors.push(err);
            errors++;
          }
          var valid2 = errors === errs_2;
        }
      } else {
        var err = {
          keyword: 'type',
          dataPath: (dataPath || '') + '.body',
          schemaPath: '#/properties/body/type',
          params: { type: 'object' },
          message: 'should be object'
        };
        if (vErrors === null) vErrors = [err]; else vErrors.push(err);
        errors++;
      }
      var valid1 = errors === errs_1;
    }
  }
  validate.errors = vErrors;
  return errors === 0;
}

const body = {
  'files': [
    {
      'fieldname': 'files',
      'originalname': 'stock.jpeg',
      'encoding': '7bit',
      'mimetype': 'image/jpeg',
      'destination': '/tmp/uploads',
      'filename': '8456802f71781abe34a1da8cca4f8244',
      'path': '/tmp/uploads/8456802f71781abe34a1da8cca4f8244',
      'size': 100367
    }
  ]
};
validateBody({body}); // fails with 'body.files[0] (type) should be a string
mbsimonovic commented 5 years ago

we should create a new issue for this

mbsimonovic commented 5 years ago

hm, could this be a case of a badly defined apidoc https://swagger.io/docs/specification/data-models/data-types/#array ? type and items are reserved words, so seems like type='string' is being interpreted as "files array should have objects of type string (in binary format)`? Kind of makes sense, files should be an array of files represented as strings (in binary format), not some objects.

mbsimonovic commented 5 years ago

when multipart schema is defined like this then it passes validation (w/ required):

        'multipart/form-data': {
          schema: {
            type: 'object',
            required: ['files'],
            properties: {
              files: {
                type: 'array',
                items: {
                  type: 'object',
                  properties: {
                    fieldname: { type: 'string' },
                    encoding: { type: 'string' },
                    mimetype: { type: 'string' },
                    size: { type: 'integer' },
                    destination: { type: 'string' },
                    filename: { type: 'string' },
                    path: { type: 'string' }
                  }
                }
              }
            }
          }
        }

this makes more sense to me, to properly define the schema according to what the backend expects, using multer in this case, then to do some hacks with custom middleware to actually disable validation. so either don't use required and disable validation, or properly define request body. What do you guys say?

mendelk commented 5 years ago

@mbsimonovic

The apidoc says that files array should have 2 props called type and format

Not quite accurate. The items in the files array have no props at all, because they're strings. The format has no effect at all (unless you define a customFormat).

Doesn't validate when schema defines files as an array, but only one file is uploaded. files.length is 1 then so req.body.files == '' , but the schema says it should be an array

See my extended example linked in README for code that handles both single files and arrays.

this makes more sense to me, to properly define the schema according to what the backend expects, using multer in this case

The problem is that that the spec expects type: string, format: binary for strings (as do all the tools that tie in, such as Swagger UI etc). The fact that Multer is being used is not connected to the spec at all, so I'm not sure it makes sense to change the validator to expect a Multer object when the spec calls for a type: string, format: binary.

so either don't use required and disable validation

I don't think it makes sense to change your spec to satisfy your validator. (In fact , you don't always have that option, when the spec isn't defined by you.)

mbsimonovic commented 5 years ago

Let's step back. How am I supposed to write the spec for multiple files upload? I thought this is correct, no:

    requestBody: {
      description: 'a multipart',
      content: {
        'multipart/form-data': {
          schema: {
            type: 'object',
            // required: ['files'], // fails #279
            properties: {
              files: {
                type: 'array',
                items: { type: 'string', format: 'binary' }
              }
            }
          }
        }
      }
    }

That expects req.body.files to be an array of strings (in binary). In this case it's perfectly valid to upload a single file, the array files will have one element. In this case your example fails because req.body.files will be an empty string, instead of an array having one empty string.

mendelk commented 5 years ago

Correct. My example was for a single file.

'multipart/form-data': {
  schema: {
    type: 'object',
    required: ['file'],
    properties: {
      file: {
        type: 'string',
        format: 'binary'
      }
    }
  }
}

To be 100% accurate, you can't do this without inspecting the apiDoc for this route because you don't know if it should be an array or a single file. In my example, I make the assumption that if there are multiple files with the same fieldname, we have an array. But that can fail in the real world, if for example your spec specifies an array but a single file happened to be passed...

mbsimonovic commented 5 years ago

just to double check, with consumesMiddleware defined, does it execute on all requests, not just some custom paths that should handle uploads? This is a security problem:

WARNING: Make sure that you always handle the files that a user uploads. Never add multer as a global middleware since a malicious user could upload files to a route that you didn't anticipate.
Only use this function on routes where you are handling the uploaded files.

mendelk commented 5 years ago

I think it does execute on all routes, but not quite sure what the security problem is. Can you explain?

mbsimonovic commented 5 years ago

i'm not sure either, but sounds like something like this:

  1. there're 2 paths, /ctrlpanel and /upload, both accept POST requests but only /upload should handle files
  2. a malicious user sends a multipart POST to /ctrlpanel
  3. multer runs and handles file upload because consumesMiddleware runs on all requests, not just for /upload (but only if /ctrlpanel accepts multipart!)

can't think of an exploit from the top of my head, but the fact that it runs where it shouldn't is a concern :)

mendelk commented 5 years ago

Yeah I'm not seeing the problem but then again I'm no security expert 😀.

I think a strong argument can be made that Multer should be an official dependency of express-openapi and that the library would handle all multipart file uploads, only on the specified routes based on the spec etc.

What are your thoughts @jsdevel?

mbsimonovic commented 5 years ago

@mendelk there's pros and cons: Pros:

i'm so happy i don't have to make this decision :)

mendelk commented 5 years ago

I agree with your list, however, considering that Multer is the official way to handle file uploads (according to express docs), maybe the con isn't too bad!