mscdex / busboy

A streaming parser for HTML form data for node.js
MIT License
2.84k stars 213 forks source link

Emit buffer for fields #324

Open Sethlans opened 1 year ago

Sethlans commented 1 year ago

Feature request #323

When dealing with a third-party tool I noticed that some of the part of the multipart were with a particular charset, and others were with utf-8.

Since busboy assumes that all fields are from the same charset, this leads to an issue when parsing. I added a raw buffer to the data emitted for the fields to be able to directly work with the Buffer and parse it with the correct charset.

bb.on('field', (name, val, buffer, info) => {
      console.log(`Field [${name}]: value: %j`, val);
      console.log(`Field [${name}] raw buffer`, buffer)
    });
mscdex commented 1 year ago

I think a better and more backwards compatible solution would be to add a 'buffer' charset that returns the Buffer instead of converting to string.

Sethlans commented 1 year ago

I think a better and more backwards compatible solution would be to add a 'buffer' charset that returns the Buffer instead of converting to string.

Uhm, I do not follow you. This was the fast solution I implemented in our scenario Maybe a config to switch from converted string to Buffer in the val parameter?

mscdex commented 1 year ago

Uhm, I do not follow you.

You can specify the default character set when you create the busboy instance. Typically you'd specify something like 'utf8' if you know all of your fields will contain UTF-8 characters. What I was proposing was adding support for a special character set called 'buffer' that gives you the raw Buffer objects instead of creating any JavaScript strings.

mscdex commented 1 year ago

Additionally regarding the backwards compatibility, while this solution may work for your use case, breaking everyone else's code by adding another parameter to event handlers is not a good idea. If this PR's changes were added as-is, everyone would suddenly be getting a Buffer instead of the info object containing the various pieces of information about the part.

Sethlans commented 1 year ago

Additionally regarding the backwards compatibility, while this solution may work for your use case, breaking everyone else's code by adding another parameter to event handlers is not a good idea. If this PR's changes were added as-is, everyone would suddenly be getting a Buffer instead of the info object containing the various pieces of information about the part.

Yeah, you are absolutely right, I will rewrite this to ensure that the code will not break for everyone. Thanks for the suggestion

Sethlans commented 1 year ago

Owever, there should be a config to force the buffer charset. The defCharset config is overwritten if in the headers we have the charset specified. something like forceFieldBuffer or similar in order to always output the buffer in the val parameter instead of the parsed text

Sethlans commented 1 year ago

@mscdex in the end I added a new config. If I would like to specify a charset this would be however overwritten if another charset is found inside the multipart body. By adding a config field I will overwrite this behaviour and emit field as buffer only in that situation. By default the field is false to avoid breaking changes

Sethlans commented 11 months ago

Hey @mscdex, I was just wondering if you had the time to review the latest changes I applied