mscdex / busboy

A streaming parser for HTML form data for node.js
MIT License
2.84k stars 213 forks source link

multipart/form-data windows-1251 #334

Closed UncleVic closed 1 year ago

UncleVic commented 1 year ago

If a form contains charset=windows-1251, data can't be decoded correctly. After parsing a request a string will be corrupted.

mscdex commented 1 year ago

You can modify the default character sets by passing in a configuration object. However, you will be limited to what node has built-in support for (meaning Buffer encodings).

You will probably want to use an intermediate encoding like 'latin1' and then convert that to windows-1251 using whatever you like, whether it be util.TextEncoder/util.TextDecoder if your node binary was built with full (or possibly system) ICU or a 3rd party module (like iconv-lite).

mscdex commented 1 year ago

Actually, I believe if you want to rely on ICU availability, you should already be able to specify an encoding that TextDecoder accepts (e.g. 'windows-1251') and busboy will use that to decode strings. This will only work for defParamCharset though.

UncleVic commented 1 year ago

I hoped the library can get charset from headers... Unfortunately, I don't use the busboy directly. There is a big chain, NestJs -> multer -> busboy. And as I can see, the multer doesn't pass any default charsets. I understand, it's multer problem, I think so. But why don't take a charset from the headers?

My payload, for example. The tag <ErrorText> contains the charset windows-1251

POST /api/v1/events/providers/ipay HTTP/1.1
X-Forwarded-Proto: https
Connection: close
Content-Length: 673
Content-Type: multipart/form-data; charset=windows-1251; boundary=BS_20230321154848
ServiceProvider-Signature: SALT+MD5: 2A6DDD2DF5147B132D07455A3AF1243B
User-Agent: BS_SOU_749

--BS_20230321154848
Content-Disposition: form-data; name="XML"

<?xml version="1.0" encoding="windows-1251" ?>
<ServiceProvider_Request>
  <DateTime>20230321154848</DateTime>
  <Version>1</Version>
  <RequestType>TransactionResult</RequestType>
  <ServiceNo>1</ServiceNo>
  <PersonalAccount>2gzD17a7MHFpnQGZmGag75</PersonalAccount>
  <Currency>933</Currency>
  <RequestId>6888</RequestId>
  <TransactionResult>
    <TransactionId>356624</TransactionId>
    <ServiceProvider_TrxId>2gzD17a7MHFpnQGZmGag75</ServiceProvider_TrxId>
    <CardTerminal>888888</CardTerminal>
    <ErrorText>Îïåðàöèÿ îòìåíåíà</ErrorText>
  </TransactionResult>
</ServiceProvider_Request>
--BS_20230321154848--
UncleVic commented 1 year ago

I figured out. My problem in the Multer. You are right, if I pass defParamCharset=latin1 I can decode my string by iconv.decode(Buffer.from(stringFromBusboy, 'latin1'), 'win1251')

But Multer calls the Busboy constructor as Busboy({headers: req.headers, limits: limits, preservePath: preservePath}) and after that, I have broken UTF8 string.