Closed Doreen-Schwartz closed 2 months ago
Hi @Doreen-Schwartz, this is an interesting subject. Honestly, throughout various law-enforcement agencies across the world I have not encountered such a requirement. Nevertheless, adding a conversion process into the library would be fine with me.
A question: how the charset encoding is conveyed between the NIST file producer and the NIST file consumer? Is there a special field in the NIST file? Or just by convention? That would somehow influence the futher discussion.
I don't want to tie node-nist
to any specific conversion library (such as iconv-lite, node-iconv...).
I am thinking to add a new property (callback function) to NistCodecOptions
interface and then use that instead of Buffer.toString
and Buffer.write
.
If you are willing to prepare a PR, please do it in several steps:
As far as I have seen, there is no specific field that the consumer takes into consideration. We have an existing producer that creates NIST files which the consumer does successfully read, and I didn't notice any fields that provided this information- not in the file or the documentation for the consumer. So it is just a convention for this specific consumer.
Just to give some context, when I tried uploading a NIST produced using the node-nist
library to our consumer, any Hebrew characters (which is the main reason non-ASCII characters are needed) would appear as gibberish. Speaking to someone from the team that worked on the consumer, they said that it only accepts WIN-1255 encoding for Hebrew characters.
As for the potential changes, I was thinking I could add callbacks for the NistFieldEncodeOptions
and NistFieldDecodeOptions
interfaces, one for writing fields into the buffer and one for decoding fields from the buffer.
Something like this, maybe:
/** Encoding options for a single NIST Field. */
interface NistFieldEncodeOptions extends NistFieldCodecOptions {
formatter?: (field: NistField, nist: NistFile) => NistFieldValue;
informationWriter?: (informationItem: NistInformationItem, data: EncodeTracking) => EncodeTracking;
}
Which would then be used in place data.buf.write in encodeNistInformationItem
, and of course a parallel for decode. Does this make sense?
Another thing I noted is that there is usage of the Buffer.byteLength
function in a few places. I imagine this would also require some changes.
For decoding logic, the proposed interface enhancement looks good, such as:
/** Decoding options for a single NIST Field. */
export interface NistFieldDecodeOptions extends NistFieldCodecOptions {
parser?: (field: NistField, nist: NistFile) => Result<NistFieldValue, NistParseError>;
informationDecoder?: (buffer: Buffer, startOffset: number, endOffset: number): string;
}
For encoding logic, consider that the the NistFile
tree gets visited couple of times, in particular first for getting right the lengths of all the fields, records, target buffer etc. And second time for actually writing the data into the buffer.
So either you need two new properties (one for determining the length and another one for writing the converted data) or the new property should be rather along a generic lines of informationWritter?: (informationItem: NistInformationItem): Buffer
.
Performance-wise, I'd say a charset conversion process is computationally comparable to Buffer.byteLength
of an ASCII string. This means it's probably fine to perform the character conversion twice: first for determining the length and then for the serialization. Anyway, you should do at least some rudimentary performance comparison between plain 'UTF-8' and 'WIN-1255' enabled workloads.
@Doreen-Schwartz I've finished updating the documentation. Please check https://github.com/ivosh/node-nist/commit/847395761adede959c509f092aa0d7464ece5480.
For consistency, I've renamed informationWriter
to informationEncoder
. I hope it's fine with you.
Once I get a green light from you, I'll publish a new npm version.
Thanks again for the PR.
@ivosh Overall looks great! Only one thing I noticed is that in the JSDoc for informationEncoder
, the parameters are the same as for formatter
instead of the function's actual parameters.
Otherwise seems to cover things well 👍
@Doreen-Schwartz Thank you for the review. I am always glad to have another pair of eyeballs ;-) New release 0.10.0 is out.
Hello, I understood that currently, the only supported encoding for fields that are not Type 1 is UTF-8. I was wondering if there are any intentions to add support for additional encoding?
If not, I would love to try implementing this myself. Would appreciate getting pointers to how to implement this- which parts of the encoding process would require adjusting to enable this functionality?
From what I've gathered, the default Node Buffer currently used to construct the NIST file only supports very specific types of encoding. Looking at possible alternatives, iconv-lite seems like a decent package to utilize for this.
@JonathanZahavii