tahonermann / text_view

A C++ concepts and range based character encoding and code point enumeration library
MIT License
122 stars 14 forks source link

Add support for Unicode normalization output iterators #20

Open tahonermann opened 8 years ago

ruoso commented 7 years ago

I would argue that normalization shouldn't be an output iterator, but rather a transform algorithm.

tahonermann commented 7 years ago

I tend to agree that, in most cases, treating normalization as a transcoding operation is probably what is desirable. Such an interface can be specialized for particular iterators (pointers) to provide higher performance as well. The benefit of an output iterator (or a proxy input iterator) is that the transformation can be done lazily. I think both interfaces have their uses.

ruoso commented 7 years ago

I think that is an academical use case. In practice the reason why you need to normalize the text is because you will perform an operation that requires the text to be normalized.

Making the normalizing into an output iterator would be technically valid, but semantically confusing.

Em sáb, 24 de set de 2016 18:14, Tom Honermann notifications@github.com escreveu:

I tend to agree that, in most cases, treating normalization as a transcoding operation is probably what is desirable. Such an interface can be specialized for particular iterators (pointers) to provide higher performance as well. The benefit of an output iterator (or a proxy input iterator) is that the transformation can be done lazily. I think both interfaces have their uses.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tahonermann/text_view/issues/20#issuecomment-249390570, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE9K53M6Ig3W4DnWU9QRItO7KXiHy39ks5qtaCmgaJpZM4Hoxuy .

tahonermann commented 7 years ago

I can see use cases for wanting to perform an operation on normalized text in a lazy fashion.

Regardless, I agree that a transcoding interface has more potential uses. That falls under issue #4.