Closed jfroelich closed 5 years ago
Even though I may want to introduce normalization into some earlier stage in processing, I think the place to start for now is at the model layer. The model layer is supposed to be responsible for sanitizing its input, regardless of whatever other layers do interacting with the model.
Therefore I think the best way to start is:
From the spec:
Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate.
So, probably want to avoid the specialized form even though it is more compact, because of the risk of loss of meaning. so the default NFC probably is the one I want, so just calling String.prototype.normalize
without a second argument so that it defaults to NFC is probably what I want to do.
Side question: does normalization occur when using Response.prototype.text
or innerHTML
? If so then normalization is already performed implicitly elsewhere and all of this is a waste of time other than the learning aspect.
Take special note of section 1.4 regarding concatenation. The basic takeaway is that if I plan to break apart a string, change its parts, then recompose it, normalization should wait until after that process, it should be waiting until after the time any changes are going to be made and all concatenations are completed, so that it ensures that concatenation does not destroy the normalization and defeat the entire point of this exercise.
That last note shifts the scales a bit regarding when normalization should be performed. This suggests that it is best at the model layer, just before updating the persistent storage model, because we know at that point that no more changes will be made, and can contractually warrant it by having it be encapsulated within a more opaque API surface that protects the value's immutability once it is within the model function body.
So, in summary: change sanitize-feed
and sanitize-entry
to apply string normalization, and document that the caller should not make any more changes to values after those functions have been called. Or place the functionality within the update-feed/entry functions to enforce it and basically completely remove any caller discretion.
I need to enforce normalization within the model within every function that does insert or update. What is the best way to implement this? Do I even want this feature to be fully encapsulated within the model and abstracted away (information hiding)? Do I want a function like normalize-entry or Entry.prototype.normalize? Or a private helper that each state-modifier function calls internally?
Work remaining:
Input data should be normalized where appropriate.
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html