usnistgov / jsfive

A pure javascript HDF5 reader
Other
102 stars 18 forks source link

Node.js built with icu_small returns error upon loading ArrayBuffer #40

Open jelzo opened 1 year ago

jelzo commented 1 year ago

The following snippet:

const arrayBuffer = (await axios(temporaryDownloadUrl, {
        responseType: 'arraybuffer'
    })).data.buffer;
    const data = new hdf5.File(arrayBuffer);

Returns:

RangeError: The "ascii" encoding is not supported
RangeError[ERR_ENCODING_NOT_SUPPORTED]: The "ascii" encoding is not supported
at new NodeError(node: internal / errors: 393: 5)
at new TextDecoder(node: internal / encoding: 403: 15)
at DataObjects._decode_link_msg(/home/deb4693 / nodevenv / domains / xxx.xx / node_dev / 18 / lib / node_modules / jsfive / dist / cjs / index.js: 5552: 16)
at DataObjects._iter_links_btree_v2(/home/deb4693 / nodevenv / domains / xxx.xx / node_dev / 18 / lib / node_modules / jsfive / dist / cjs / index.js: 5586: 40)
at _iter_links_btree_v2.next( < anonymous > )
at DataObjects._iter_link_from_link_info_msg(/home/deb4693 / nodevenv / domains / xxx.xx / node_dev / 18 / lib / node_modules / jsfive / dist / cjs / index.js: 5571: 19)
at _iter_link_from_link_info_msg.next( < anonymous > )
at DataObjects.iter_links(/home/deb4693 / nodevenv / domains / xxx.xx / node_dev / 18 / lib / node_modules / jsfive / dist / cjs / index.js: 5499: 21)
at iter_links.next( < anonymous > )
at Function.fromEntries( < anonymous > )

Environment: Node.js 18.9.1

Cause: Many production Node.js hosting environments are built with small-icu and not the full icu data, so the module can't use encodings other than utf-8, utf-16, utf-16be in TextDecoder(encoding).

bmaranville commented 1 year ago

I'm not an expert on text encodings, but it seems like maybe ASCII is a strict subset of UTF-8 (at least in this context), so that we could always use a UTF-8 decoder on strings in the jsfive code, since ASCII and UTF-8 are the only encodings supported internally for names in HDF5. It would actually simplify the code a bit: where there's a switch to choose encoders based on a flag in the metadata, we could just ignore that flag and always use TextDecoder('utf-8')