kooloveme / thtmlviewer

Automatically exported from code.google.com/p/thtmlviewer
Other
0 stars 0 forks source link

CodePage is never set #256

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
There is bug in THTMLViewer. It's property CodePage is never set it could be 
easily fixed by adding this line to end of LoadDocument method:

FCodePage := FDocument.CodePage;

Below we try to duplicate that functionality and read our own code page of 
document.

--------------------

var
  fs: TFileStream;
  Buffer: packed array[0..2] of Byte;
begin

  if not FileExists(AFileName) then
    Exit;

  hv.LoadFromFile(AFileName);
  hv.CodePage := 0;

  fs := TFileStream.Create(AFileName, fmOpenRead);
  try
    fs.Read(Buffer, sizeof(Buffer));
    if (Buffer[0] = $EF) and (Buffer[1] = $BB) and (Buffer[2] = $BF) then
      hv.CodePage := CP_UTF8;

    if (Buffer[0] = $FF) and (Buffer[1] = $FE) then
      hv.CodePage := CP_UTF16LE;

    if (Buffer[0] = $FE) and (Buffer[1] = $FF) then
      hv.CodePage := CP_UTF16BE;
  finally
    fs.Free;
  end;

Original issue reported on code.google.com by xyz.123....@gmail.com on 16 Jul 2013 at 1:09

GoogleCodeExporter commented 9 years ago
I think that we need to take a good look at Charset priority.

Please see:

http://www.w3.org/International/O-charset

Original comment by jpmug...@suddenlink.net on 21 Jul 2013 at 1:26

GoogleCodeExporter commented 9 years ago
A better reference is:
http://www.w3.org/International/questions/qa-html-encoding-declarations

In the case of conflict between multiple encoding declarations, precedence 
rules apply to determine which declaration wins out. For XHTML and HTML, the 
precedence is as follows, with 1 being the highest.

    HTTP Content-Type header
    byte-order mark (BOM)
    XML declaration
    meta element
    link charset attribute

The high precedence of the HTTP header is useful, as mentioned earlier, in 
situations where the encoding of the document is changed by an intermediary 
server, since such 'transcoding' is unlikely to change the in-document 
declarations. The transcoding server should, however, declare the new encoding 
in the HTTP header.

The HTML5 specification (which is not yet stable) formally describes precedence 
for the byte-order mark (BOM). According to the specification, the BOM has 
lower precedence than the HTTP Content-Type header, but higher precedence than 
anything else. At the time of writing, this was not consistently implemented in 
the latest versions of major browsers. For more information see the test 
results.

Original comment by jpmug...@suddenlink.net on 22 Jul 2013 at 10:19

GoogleCodeExporter commented 9 years ago
@xyz
THtmlViewer.CodePage is ought to be a default codepage not the current 
document's actual one. I'm thinking about an alternate codepage property or a 
document property.

@Peter
Yes, we should implement the recommended order.

Original comment by OrphanCat on 23 Jul 2013 at 8:30

GoogleCodeExporter commented 9 years ago
r464 adds property THtmlViewer.DocumentCodePage instead of setting 
THtmlViewer.CodePage.

Original comment by OrphanCat on 26 Apr 2014 at 9:48