mscdex / busboy

A streaming parser for HTML form data for node.js
MIT License
2.84k stars 213 forks source link

Incorrect parsing due to using latin1Slice in multipart.js #319

Closed djridoo closed 2 years ago

djridoo commented 2 years ago

Hi there! I've catch some incorrect result in an original file name after load smth like 031汉堡包漢堡包, 汉堡漢堡.docx. I see 031æ±‰å ¡åŒ…æ¼¢å ¡åŒ…, æ±‰å ¡æ¼¢å ¡.docx I noticed that you've totally changed implementation here https://github.com/mscdex/busboy/commit/54a86838c15bba1fc78eebdfa3c6a986a5e57dd9. And I noticed using chunk.latin1Slice() here https://github.com/mscdex/busboy/blame/master/lib/types/multipart.js#L112 So now I have to use Buffer.from(file.originalname, 'latin1').toString() to get a correct result. Why don't we use slice instead of latin1Slice?

mscdex commented 2 years ago

Set defParamCharset in the busboy configuration to whatever character set you want to use when the client doesn't explicitly state the encoding in the field.

latin1 is used because it is/was the encoding traditionally used by clients and relevant RFCs, so it's there for compatibility and because it keeps bytes intact (whereas if we always assumed UTF-8 and that's not how the field was encoded, there is no way to get the original value aside from keeping a separate copy of the original bytes which isn't a very clean solution).