kinglcc / juniversalchardet

Automatically exported from code.google.com/p/juniversalchardet
0 stars 0 forks source link

Need to know the size of BOM #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
When a file starts with a Byte Order Mark, there needs to be a way to discard 
those bytes. The detected charset is not enough information, because the file 
may include a BOM or not.

The easy way would be a method indicating the number of bytes to skip.

What steps will reproduce the problem?
1. Run the universal detector on a file with a BOM, such as UTF-16LE
2. Open a reader using the detected charset
3. Observe the spurious first character

Original issue reported on code.google.com by marcus.downing on 29 Apr 2011 at 12:08