Closed wesleimarinho closed 11 months ago
Have you captured a copy of the raw binary request data when it happens and compared the data to see if the difference in character bytes exists there (or is that what you've already done)? If so, there's nothing busboy
can do about that as it can only work with what it's given.
@mscdex What is the recommended way to do that? Intercept the request in NodeJs or to get it on the sending tool?
Either should work, but on the node side you could just do req.pipe(fs.createWriteStream('/tmp/foo'));
instead of req.pipe(busboy);
Then just use a hex editor (e.g. xxd
on Linux) to look at the contents of the raw request and see what the bytes are for the filename in question.
@mscdex I finally found out what's happened. Filenames on Mac OS have this specificity https://stackoverflow.com/questions/6153345/different-utf8-encoding-in-filenames-os-x. Is there any way to configure busboy to always get filenames in NFC normalized form instead of fully decomposed form?
For the record, I've solved it on my end with:
const filenameBuffer = Buffer.from(file.name, 'utf-8');
const normalizedFilename = filenameBuffer.toString().normalize('NFC');
const encodedFilenameUTF8 = Buffer.from(normalizedFilename).toString('utf-8');
Therefore, this issue can be closed.
I have set
defParamCharset
toutf8
, but I'm getting different results for some users.When sending (for example, a file named
Declaração.pdf
):ç
- Hex:E7
Sometimes it receives the same filename, however sometimes
ç
(Hex63 237
) is received (they look the same, but are different characters.The same thing happens with
ã
(HexE3
), which sometimes is received asã
(Hex61 303
).Both users used the same browser (Chrome) and version (113), on same operating system (Windows 11 Portuguese), sending the same file.
The hex codes where obtained by using https://www.rapidtables.com/convert/number/ascii-to-hex.html.
Server uses Busboy 1.6.0 and NodeJs 16, on CentOS 7, system locale is en_US.UTF-8.
Anyone got any idea of what may be causing the issue?