Closed UncleVic closed 1 year ago
You can modify the default character sets by passing in a configuration object. However, you will be limited to what node has built-in support for (meaning Buffer
encodings).
You will probably want to use an intermediate encoding like 'latin1'
and then convert that to windows-1251 using whatever you like, whether it be util.TextEncoder
/util.TextDecoder
if your node binary was built with full (or possibly system) ICU or a 3rd party module (like iconv-lite
).
Actually, I believe if you want to rely on ICU availability, you should already be able to specify an encoding that TextDecoder
accepts (e.g. 'windows-1251'
) and busboy
will use that to decode strings. This will only work for defParamCharset
though.
I hoped the library can get charset from headers... Unfortunately, I don't use the busboy directly. There is a big chain, NestJs -> multer -> busboy. And as I can see, the multer doesn't pass any default charsets. I understand, it's multer problem, I think so. But why don't take a charset from the headers?
My payload, for example. The tag <ErrorText>
contains the charset windows-1251
POST /api/v1/events/providers/ipay HTTP/1.1
X-Forwarded-Proto: https
Connection: close
Content-Length: 673
Content-Type: multipart/form-data; charset=windows-1251; boundary=BS_20230321154848
ServiceProvider-Signature: SALT+MD5: 2A6DDD2DF5147B132D07455A3AF1243B
User-Agent: BS_SOU_749
--BS_20230321154848
Content-Disposition: form-data; name="XML"
<?xml version="1.0" encoding="windows-1251" ?>
<ServiceProvider_Request>
<DateTime>20230321154848</DateTime>
<Version>1</Version>
<RequestType>TransactionResult</RequestType>
<ServiceNo>1</ServiceNo>
<PersonalAccount>2gzD17a7MHFpnQGZmGag75</PersonalAccount>
<Currency>933</Currency>
<RequestId>6888</RequestId>
<TransactionResult>
<TransactionId>356624</TransactionId>
<ServiceProvider_TrxId>2gzD17a7MHFpnQGZmGag75</ServiceProvider_TrxId>
<CardTerminal>888888</CardTerminal>
<ErrorText>Îïåðàöèÿ îòìåíåíà</ErrorText>
</TransactionResult>
</ServiceProvider_Request>
--BS_20230321154848--
I figured out. My problem in the Multer.
You are right, if I pass defParamCharset=latin1
I can decode my string by
iconv.decode(Buffer.from(stringFromBusboy, 'latin1'), 'win1251')
But Multer calls the Busboy constructor as Busboy({headers: req.headers, limits: limits, preservePath: preservePath})
and after that, I have broken UTF8 string.
If a form contains charset=windows-1251, data can't be decoded correctly. After parsing a request a string will be corrupted.