tahonermann / text_view

A C++ concepts and range based character encoding and code point enumeration library
MIT License
122 stars 14 forks source link

Add a `utfbom` encoding that handles UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE #33

Open tahonermann opened 6 years ago

tahonermann commented 6 years ago

text_view currently defines utf8bom, utf16bom, and utf32bom encodings that detect a BOM and dispatch to the appropriate non-BOM encoding to consume remaining input. However, a utfbom encoding would be useful to consume UTF-8, UTF-16, and UTF-32 formatted files that contain a BOM.

There is a question of what to do if the input lacks a BOM. Options are to fail or fallback to an assumed encoding. A policy class could be used to allow programmer control; e.g., fail, fallback to UTF-8, etc...