panzerdp / voca

The ultimate JavaScript string library
https://vocajs.pages.dev
MIT License
3.6k stars 137 forks source link

Any plans to handle encodings? #60

Open Rudxain opened 2 years ago

Rudxain commented 2 years ago

I think this library should include functions to convert Strings to TypedArrays and viceversa, with the option to interpret as UTF-8 or UTF-16 depending on use case. So if a string is to be interpreted as UTF-8, the corresponding TypedArray should be a Uint8Array, otherwise UTF-16 is used and a Uint16Array of code-units is returned. If an arbitrary TypedArray is provided, it'll be read as a string of octets, and those octets will be converted to a string depending on the chosen encoding. UTF-16 has endianess and BOMs, so a function that handles implicit and explicit endianess would be slightly more complicated.

Another thing related to encodings are binary-to-text encodings like Hexadecimal, base64, base85, etc. JS already has base64 support with atob and btoa, but hex and base85 are missing, which could be provided by this library.

I don't know if these features should be added to this library because Voca seems to be intended for high-level (not low-level) use cases, and adding base85 support would be pointless because it's rarely used. Any constructive criticism is appreciated

Rudxain commented 1 year ago

Additionally, it's a good idea for the query category to include at least 1 of the following validators:

/** Check if string only has Basic Multilingual Plane code-points */
const isBMP = s => typeof s == 'string' && [...s].length === s.length

/**
Check if valid ASCII
@param printable limits the range to printable chars only
*/
const isASCII = (s, printable = false) =>
    typeof s == 'string' && (printable ? /^[\x20-\x7e]*$/g : /^[\x00-\x7f]*$/g).test(s)

/** Check if valid binary string */
const isBinStr = s => typeof s == 'string' && /^[\x00-\xff]*$/g.test(s)

Checking if a string is valid ASCII or BMP is useful. Checking if a value is a valid binary string is only (usually) useful when dealing with atob and btoa, so that can be ignored