Closed totyaxy closed 1 year ago
Hi,
Nice hint! That was in my plans.
I think it is simple to implement BOM support and LoadFromStream()
is indeed the right place to filter the enconding.
Stay tuned.
Regards,
Thank you!
Hi, about UTF-8 BOM, please test the latest update.
Hi, I saw the BOM bypass code, thank you! I would add a tip to be faster (we need the speed), because this solution will completely copy the string, even if it is very big, and this is an unnecessary operation. I think we should detect Bom as a stream, and then:
if utf8bomheaderdetect (stream) THEN Stream.seek (Length (Cutf8bomheader), Sofromcurrent);
Because stream.read will read from Position(!). So you just have to slip the position. Or simply set the PostionAfterBOM to TRUE in UTF8BOMHeaderDetect, my quick code for this:
type
TUTF8BOMHeader = array [0..2] of byte;
const
//https://en.wikipedia.org/wiki/Byte_order_mark
cUTF8BOMHeader: TUTF8BOMHeader = ($EF, $BB, $BF);
function UTF8BOMHeaderDetect(const Stream: TStream;
const PostionAfterBOM: boolean = False): boolean;
var
UTF8BOMHeader: TUTF8BOMHeader;
StreamPositionBuf: int64;
begin
Result := False;
Initialize(UTF8BOMHeader, Length(UTF8BOMHeader));
if Stream.Size < Length(cUTF8BOMHeader) then
exit;
StreamPositionBuf := Stream.Position;
Stream.Read(UTF8BOMHeader, Length(UTF8BOMHeader));
if CompareMem(@UTF8BOMHeader, @cUTF8BOMHeader, Length(cUTF8BOMHeader)) then
begin
if not (PostionAfterBOM) then Stream.Position := StreamPositionBuf;
Result := True;
end
else
Stream.Position := StreamPositionBuf;
end;
But by the way, your code also loads the UTF8 BOM encoded file, thanks!
Nice hint about performance. I'll give it a try.
Thanks.
Hi!
Unfortunately, many utf8 json files contain a BOM, which McJSON cannot read. I solved this in Lazarus, of course, but I don't know how the code would work under Delphi (UTF16?), so I don't offer a pull request. So my request/recommendation would be that you also support UTF8 BOM encoded files, at least for reading (I did it for writing as well, but it's not that important). The code itself is not complicated, because if a BOM is detected in LoadFromStream, you only need to position it after the BOM (but the reading moved it anyway). UTF8 BOM: $EF $BB $BF See: https://learn.microsoft.com/en-us/globalization/encoding/byte-order-mark Since you also support Delphi, as far as I know, UTF16 encoding is the basis there, so then UTF16LE/BE BOMs could also be supported.