jshttp / content-disposition

Create and parse HTTP Content-Disposition header
MIT License
224 stars 43 forks source link

ISO-8859-1 (`ByteString`) is confused with ASCII string #43

Closed AlttiRi closed 2 years ago

AlttiRi commented 2 years ago

The readme file mentions "ISO-8859-1" 10 times!

However, it looks that it confuses (based on how it works) "ISO-8859-1" aka "Latin1" aka "ByteString" with "ASCII string", which contains 0-127 bytes. While ByteString contains 0-255 bytes.

For example, it can't produce the headers like it the most forums do. Like this one: https://xenforo.com/community/attachments/_εœ–η‰‡_πŸ–Ό_image_-png.266690/?hash=b66fd2461d70a0c017941f3bcf7b5e4a

For filename _εœ–η‰‡_πŸ–Ό_image_.png it produces String with: inline; filename="_??_??_image_.png"; filename*=UTF-8''_%E5%9C%96%E7%89%87_%F0%9F%96%BC_image_.png

while it should be ByteString with: inline; filename="_εœ–η‰‡_πŸ–Ό_image_.png"; filename*=UTF-8''_%E5%9C%96%E7%89%87_%F0%9F%96%BC_image_.png

In the console it display so: image

Yes, it's correct, since it's ByteString. Then the code that parses the headers should convert this ByteString to String.

https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary https://webidl.spec.whatwg.org/#idl-ByteString https://web.archive.org/web/20210608032047/https://developer.mozilla.org/en-US/docs/Web/API/ByteString https://web.archive.org/web/20210731105134/https://developer.mozilla.org/en-US/docs/Web/API/Headers/get


AlttiRi commented 2 years ago

The server to check it locally:

import http from "http";
import contentDisposition from "content-disposition";

const host = "localhost";
const port = 8000;
const server = http.createServer(requestListener);
server.listen(port, host, () => {
    console.log(`Server is running on http://${host}:${port}`);

const name1 = `rock&roll🎡🎢.png`;
const name2 = `rock'n'roll🎡🎢.png`;
const name3 = `image β€” copy (1).png`;
const name4 = `_εœ–η‰‡_πŸ–Ό_image_.png`;
const name5 = `100 % loading&perf.png`;
const names = [name1, name2, name3, name4, name5];

const CD1 = str2BStr(`inline; filename=${name1}`);
const CD2 = str2BStr(`inline; filename="${name2}"`);
const CD3 = str2BStr(`inline; filename="${name3}"; filename*=UTF-8''${encodeURIComponent(name3)}`);

// How it should be
const CD4  = str2BStr(`inline; filename="${name4}"; filename*=UTF-8''${encodeURIComponent(name4)}`);
// Replace non-ASCII with "?"
const CD4R = str2BStr(`inline; filename="${replaceNonASCII(name4, "?")}"; filename*=UTF-8''${encodeURIComponent(name4)}`);
// "content-disposition" library does the same:
const CD4X = contentDisposition(name4, {type: "inline"});
// What if I put ByteString to the lib? The result is broken filename.
const CD4RX = contentDisposition(str2BStr(name4), {type: "inline"});

const CD5 = str2BStr(`inline; filename="${name5}"; filename*=UTF-8''${encodeURIComponent(name5)}`);

function requestListener(req, res) {
    res.setHeader("Content-Type", "text/html; charset=utf-8");
    res.setHeader("Content-Disposition-1", CD1);
    res.setHeader("Content-Disposition-2", CD2);
    res.setHeader("Content-Disposition-3", CD3);
    res.setHeader("Content-Disposition-4", CD4);
    res.setHeader("Content-Disposition-5", CD5);
    res.setHeader("Content-Disposition-4-R", CD4R);
    res.setHeader("Content-Disposition-4-X", CD4X);
    res.setHeader("Content-Disposition-4-RX", CD4RX);
    res.end(names.slice(1).map(name => `<li>${name}</li>`).join(""));

// --- Util ---
function replaceNonASCII(str, replacer) { // Don't use with ByteString // a quick draft, do a better implementation
    return str.replaceAll(/[^\u0000-\u0127]/g, replacer);
function str2BStr(string) {
    return arrayBufferToBinaryString(new TextEncoder().encode(string));
function bSrt2Str(bString) {
    return new TextDecoder().decode(binaryStringToArrayBuffer(bString));
function arrayBufferToBinaryString(arrayBuffer) {
    return arrayBuffer.reduce((accumulator, byte) => accumulator + String.fromCharCode(byte), "");
function binaryStringToArrayBuffer(binaryString) {
    const u8Array = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
        u8Array[i] = binaryString.charCodeAt(i);
    return u8Array;

The console code to list the headers:

[...(await fetch("http://localhost:8000/", {method: "head"})).headers.entries()]
    .filter(([k, v]) => k.startsWith("content-disposition"))
    .forEach(([k, v]) => console.log(`"${k.padEnd(27)}":`, `"${v}"`))
dougwilson commented 2 years ago

The charachers εœ–η‰‡ are not part of ISO-8859-1. You can find the list of characters is ISO-8859-1 on the wikipedia page: https://en.m.wikipedia.org/wiki/ISO/IEC_8859-1

AlttiRi commented 2 years ago

The charachers εœ–η‰‡ are not part of ISO-8859-1.

Absolutely. (If you mean char codes of it)

It's way I take ArrayBuffer from the input string first, and only then I convert ArrayBuffer to ByteString. ByteString is a String with UTF-8 bytes of the input header.

AlttiRi commented 2 years ago

2.13.18. ByteString The ByteString type corresponds to the set of all possible sequences of bytes. Such sequences might be interpreted as UTF-8 encoded strings [RFC3629] or strings in some other 8-bit-per-code-unit > encoding, although this is not required.

HTTP header is a binary string of UTF-8 bytes.

dougwilson commented 2 years ago

This module is only designed to follow RFC 6266, which pretains to how this particular header is specified.


The parameters "filename" and "filename" differ only in that "filename" uses the encoding defined in [RFC5987], allowing the use of characters not present in the ISO-8859-1 character set ([ISO-8859-1]).