node-formidable / formidable

The most used, flexible, fast and streaming parser for multipart form data. Supports uploading to serverless environments, AWS S3, Azure, GCP or the filesystem. Used in production.
MIT License
7k stars 680 forks source link

Notes on/for v3 / v4 #830

Open tunnckoCore opened 2 years ago

tunnckoCore commented 2 years ago

workspaces/

misc

GrosSacASac commented 2 years ago

go async/await and promises; possibly drop callback APIs entirely

Maybe do a mix in v3 and in v4 drop callback

tunnckoCore commented 2 years ago

New FormidableFile, based on fetch-blob and Web API File. Plus, we'll always make temporary files on tmp dir, for the time the process is running. When the process ends, they are cleared (all that is handled by fetch-blob). The user will still be able to to save on disk or to do that file whatever it wants. Something like opts.save on Formidable Options, will mean that we will save the data of that temporary created file to the disk on the user-chosen (or default) upload dir.

It handles:

/**
 * @typedef {import('filenamify').Options} FilenamifyOptions
 * @typedef {import('node:path').ParsedPath} ParsedPath
 *
 * @typedef FormidableFileOptions
 * @property {string} [type] - mime type from header
 * @property {number} [lastModified] - file mtimeMs, default `Date.now()`
 * @property {number} [size] - file size in bytes
 * @property {boolean} [randomName] - generate random filename, default `true` using `cuid`
 * @property {FilenamifyOptions} [filenamify] - options passed to `filenamify`
 * @property {RenamerFn} [rename] - filename renaming function
 * @property {{[key: string]: unknown}} [data] - any additional data or way of storing things
 *
 * @typedef {(filepath: string, contents: unknown, options: FormidableFileOptions) => string} RenamerFn
 * must be synchronous function
 */

import path from 'node:path';
import cuid from 'cuid';
import { filenamifyPath } from 'filenamify';

/**
 * @typedef {{type?: string, name: string, lastModified: number}} File
 * standard Web API File
 */
// @ts-ignore
import { File } from 'fetch-blob/file.js';

export class FormidableFile extends File {
  /**
   *
   * @param {string} filepath - filename from header
   * @param {unknown} contents - any file content value
   * @param {FormidableFileOptions} [options]
   */
  constructor(filepath, contents, options) {
    const { file, opts, fp } = normalize(filepath, contents, options);

    super(contents, file.base, {
      // @ts-ignore
      type: opts.mimetype || opts.mime || opts.type,
      lastModified: opts.lastModified || Date.now(),
    });

    /**
     * @type {string} - full filepath
     */
    // we assign this here, because to not mess with ParsedPath `file`
    this.path = fp;

    // eslint-disable-next-line no-underscore-dangle
    this._updateProps(file, contents, opts);
  }

  /**
   *
   * @param {ParsedPath} file - custom file
   * @param {unknown} contents - any file content value
   * @param {FormidableFileOptions} [options]
   */
  // eslint-disable-next-line no-underscore-dangle
  _updateProps(file, contents, options) {
    // it's already normalized
    // const opts = normalizeOptions(options);

    this.basename = file.base;
    this.dirname = file.dir;
    this.extname = file.ext;

    this.stem = file.name; // this.stem == this.name == file.name == basename without extname
    this.mime = options.type;

    if (options.size) {
      // @ts-ignore
      this.size = options.size;
    }

    Reflect.defineProperty(this, 'contents', { value: contents });
    Reflect.defineProperty(this, 'options', { value: options });
    Reflect.defineProperty(this, 'base', { value: file.base });
    Reflect.defineProperty(this, 'dir', { value: file.dir });
    Reflect.defineProperty(this, 'ext', { value: file.ext });
    Reflect.defineProperty(this, 'data', { value: options.data });
  }
}

/**
 *
 * @param {string} filepath - filename from header
 * @param {unknown} contents - any file content value
 * @param {FormidableFileOptions} [options]
 *
 */
function normalize(filepath, contents, options) {
  const opts = normalizeOptions(options);

  /**
   * @type {string} fp
   */
  const fp = filenamifyPath(
    // @ts-ignore
    opts.rename(filepath, contents, opts),
    opts.filenamify,
  );

  /**
   *
   * @type {ParsedPath} file
   */
  const file = path.parse(fp);

  return { file, opts, fp };
}

/**
 * @param {{[key: string]: unknown} & FormidableFileOptions} [options]
 * @returns {FormidableFileOptions}
 */
function normalizeOptions(options = {}) {
  const opts = {
    filenamify: { replacement: '-', ...options.filenamify },
    randomName: true,
    data: { ...options.data },
    ...options,
  };

  /**
   * @property {RenamerFn}
   */
  opts.rename = (f, c, o) => {
    if (typeof opts.rename === 'function') {
      return opts.rename(f, c, o);
    }
    return opts.randomName ? cuid() : f;
  };

  return opts;
}

/**
 * examples
 */

const virtualFile = new FormidableFile(
  './foo/bar/qu_sas<mail@at.com>zab.gz<script onchange="alert(1)"></script>sax.png',
  await fs.readFile('./examples/10k.json'),
  {
    type: 'application/json',
    randomName: true,
    data: { sasa: 1 },
  },
);

console.log(virtualFile);

Ideally it will be called from part.on('data') in the future, but for now we can add it at _newFile, then we can use createTemporaryBlob to make that temp file and pass the return to the user.

Something like that:

part.on('data', async () => {
  let file = new FormidableFile(nameFromHeader, dataFromParser, {
    type: contentTypeFromParser,
    rename: formidableOptions.rename,
  });
  const blob = await createTemporaryBlob(
    file.contents,
    file.type,
    file.options.signal,
  );

  file.blob = blob;
  this.emit('file', file, blob);

  if (formidableOptions.save) {
    await fs.writeFile(file.path, file.contents);
  }
});

We are using createTemporaryBlob instead of createTemporaryFile because we already made a file instance of FormidableFile which extends File.

The reason of having our own File implementation on top, is that it's for useful properties that we need to pass to the users. Plus I always seeing it that way since I got in here in v1. The properties of it is based on enough battle-tested vfile (remark, rehype, unified) and vinyl (gulp), plus of course the minimalism of File.

Currently text(), stream() and those methods, are missing, but we probably should implement them too.

The whole thing can be introduced in v3 (and require node 14) and backport to v2 (without the temp file creation thing and without fetch-blob).

The rest of the interesting stuff like iterators and new APIs goes to v4.

tunnckoCore commented 2 years ago

Okay, so the @jimmywarting's cleaned Formidable parser, which is on the node-fetch is almost 2x faster than the current one.

Based on the busboys benchmark bench-multipart-files-100mb-big the old one is avg 3.7s, now it's 1.7s

jimmywarting commented 2 years ago

wow, didn't know that mine was 2x faster. did you come to any conclusion of why that is?

tunnckoCore commented 2 years ago

Not yet. I'm not in parsers and that parser looks like magic to me LOL.

Btw, to make a note: it's faster when use just the class, but not the toFormData. And when you start putting things on top of all that it gets slower. Busboy v1 really does a great job at the very low level.

I'll commit updated benchmarks soon.

jimmywarting commented 1 year ago

fetch-blob@3.2.0 is publised, you can now use createTemporaryBlob or createTemporaryFile if you like

jimmywarting commented 1 year ago

btw, formdata is built into NodeJS core now... maybe you can do a conditional import like use some formdata package like formdata-polyfill (used by node-fetch) or formdata-node (both are spec compatible) if it isn't avalible on global namespace

so something like const FormData = globalThis.FormData || await import('formdata-polyfill/esm.min.js') just so happen to know that NodeJS (undici's FormData imp accepts 3th party blobs like fetch-blob as well - not just only instances of globalThis.Blob)

fetch-blob is still going to stick around until https://github.com/nodejs/node/issues/37340 has been resolved by nodejs itself

jimmywarting commented 1 year ago

Was a super long time ago since i even used gulp and i have never heard of vinyl or filev before... honestly just wished they followed Web File IDE structure instead...

vinyl should also impl text, arrayBuffer and stream() 👍

tunnckoCore commented 1 year ago

they followed Web File IDE structure instead

Well, yea. But they predated everything i think, that's why.

I built some formidable-mini on top of your fetch-blob and formdata-polyfill, you can check it here, it's pretty clean, small and a beauty, haha. There FormidableFile is just a Web File, or anyone can just pass the return stadnard FormData to their desired place :P

ah.. what a times..

I'm open for ideas and feedback. Not sure how to use these createTemporary* yet..

GrosSacASac commented 1 year ago

So my plan is to add a few tests to https://github.com/node-formidable/formidable/pull/855 then merge and publish. This will probably be the last version 3.

And then probably mark 3 as latest. So we can avoid touching v2 like https://github.com/node-formidable/formidable/issues/908 and void having issues that were already fixed in version 3

Then finish https://github.com/node-formidable/formidable/pull/899 which transitions to typescript, so this will be version 4.

tunnckoCore commented 1 year ago

@GrosSacASac Agree.

Seems like people are always just using latest and don't even read or care about anything else. I'm watching the downloads on all versions, and the majority still are on the v2, after the v2.1 publish they started moving to it, but at a very very slow pace.

Note: v3 also need to have CJS. We can use bundt which is a very small and simple regex based converter of import-exports to CJS (does only that), so that we can have both ESM and CJS.

jimmywarting commented 1 year ago

Note: v3 also need to have CJS. We can use bundt which is a very small and simple regex based converter of import-exports to CJS (does only that), so that we can have both ESM and CJS.

it could be possible to also just create a simple but small cjs wrapper around ESM since the nature of the parser is async anyways...

// index.cjs
module.exports = function (opts) {
  return {
    // import the ESM module lazily when needed, and pass the arguments along 
    parse: (...args) => import('./index.js').then(mod => mod.dafault(opts).parse(...args))
  }
}

Then the code could still be written as ESM and you wouldn't have to duplicate all of your code as a dual cjs/esm package hazard that increases the pkg size?

tunnckoCore commented 1 year ago

@jimmywarting uh, yea. I actually do it similarly here and there. Good point.

tommyhtran commented 1 year ago

Why not merge the TypeScript PR into version 3?

jimmywarting commented 1 year ago

Don't think i have much saying in this... but i don't like TS syntax, it dose not run in browsers and never will. the ESM support and extension-less paths in TS is a hot garbage atm. I rather much prefer jsdoc + checkjs / allow js combo. I much more prefer build-free solutions that just works without wasting lots of time compiling stuff that takes time.

you can have equally good type support with plain vanilla javascript. most of the times you don't even need to annotate either javascript or typescript. let me give you an example. how would you go about using Audio to calculate the duration of a source? you would need a promise to listen on onloadedmetadata you may write something like this:

function getDruation() {
  return new Promise<number>((rs) => {
    const audio = new Audio()
    audio.preload = 'metadata'
    audio.src = url
    audio.load()
    audio.onloadedmetadata = () => {
      rs(audio.duration)
    }
  })
}

getDruation returns a number but if we rewrite it a little bit with async/await

async function getDruation() {
  const audio = new Audio()
  audio.preload = 'metadata'
  audio.src = url
  audio.load()
  await new Promise(rs => audio.onloadedmetadata = rs)
  return audio.duration
}

then you suddenly don't need to annotate anything. there are other more clever ways to have typing without annotating your code. if you use default parameters in functions then it can also figure out what the types are.

tunnckoCore commented 1 year ago

@jimmywarting agree. Not fan of TypeScript too. Love the explosion and the attention, and how the ecosystem looks, but gosh I can't stand it to write a whole codebase in TS. I think if it's so needed we can expose just a separate d.ts, and when the codebase is drastically simplified it won't be that needed too.

As I said at #899: "I'll review it at a later point, but I don't think a review is much needed. Maybe the only thing that I could ask is to try with separate file approach - e.g. just a typings file that we will distribute with the package. Just to see how it looks like."

tommyhtran commented 1 year ago

I can understand a few of the frustrations with TypeScript, but maybe it would help to view it as a necessary evil.

As it is, I already found a few coding mistakes in the process of converting the codebase to TS. It sounds like the drawback here is having a compilation step and being forced to write code a little differently. IMO, compilation isn’t a big deal. Taking away the ability to write code more freely is what we need consensus on.

The alternative to the PR is probably going to be the status quo, type-less development in JS and a .d.ts file that may fall behind on any given commit.

tunnckoCore commented 1 year ago

@tommyhtran agree.

But I'm coming from that that the parser could be hard to be typed, and really there's not much need for that - it's an internal. The rest is just a big bloat around the parser which can be simplified dramatically just like I did with formidable-mini (i'm playing around), which is around the same parser, but the rest on top of it is just a single simple class which won't be that hard to be written in TypeScript - not like the current one.

The thing is that we obviously (based on downloads stats of older versions) should focus on legacy and backporting things.. which drives out the motivation to work at all. In 2023 I'll figure out how to make users to migrate or upgrade, and why not we can get paid for helping and consulting. I'm very excited about v3, v4 and formidable-mini, but..

GrosSacASac commented 1 year ago

Ok lets have a vote (react with one)

Continue with current situation (types in a separate project) ❤️

Add type defintion side by side to original source files (.js files + .d.ts files) 🚀

Convert project to typescript and have type definition in the source files, use a compiler to have js files on publish 😄

tunnckoCore commented 3 months ago

@GrosSacASac i'll move formidable-mini as a separate repo in the org.

Currently it is on https://github.com/tunnckoCore/opensource/tree/master/modules/formidable-mini.

We can start moving tests there, and maybe tag it as v4? It's already API compatible kind of, without the file size options and stuff, but could be used for migrating users. And we can type it too.