UTF8 translation - Githubissues

GoogleCodeExporter commented 9 years ago

I have a situation where I display UTF8 HTML pages from an app then need to 
read that data back from HTMLViewer, make changes then output again with those 
changes.

One of the pages I have contains a multi byte sequence E2 80 99 and doing the 
above read and reload sequence shows nothing after that code sequence on the 
reloaded page.

I am using DocumentSource to retrieve the page data but it returns a string 
that is shorter than the one I originally loaded.

Have determined that the UTF8 conversion in HTMLBuffer.pas, TBuffer.NextChar, 
is being processed for the read which results in  it returning a single 92 char 
rather than e2 80 99. 

What I need is to be able to read the RAW page data rather than translated data.

thank you

Original issue reported on code.google.com by i...@acs121.com on 19 Jul 2012 at 1:23

GoogleCodeExporter commented 9 years ago

Hi,

I cannot reproduce this issue. 

Could you please post the code snippet/s you use to read the DocumentSource and 
to re-apply it to the HtmlViewer incl. declarations of used variables.

That might help to reproduce the issue.

Thanks
OrphanCat

Original comment by OrphanCat on 20 Jul 2012 at 8:50

GoogleCodeExporter commented 9 years ago

Hi,

Attached Lazarus project demonstrates the issue.

Press the Load button to initially load the text to the viewer.

The insert button then reads the Document Source, determines the position of 
the caret and inserts the text [[Attach]] at that point. It then sends the new 
stream back to the viewer. This is where the balance of the text, past the E2 
80 99 char set disappears.

Have found that the data received from reading DataSource has 92 at the point 
where the E2 80 99 should be. The data is correctly loaded to the viewer but is 
UTF8 translated before returned by the DocumentSource call. Need some mechanism 
to return the data as a RAW stream.

thanks

Original comment by i...@acs121.com on 22 Jul 2012 at 12:51

Attachments:

unit1.zip

GoogleCodeExporter commented 9 years ago

Hi,

Assume you have been able to reproduce and wonder if there is any progress on 
this item? Is there anything else I can do to assist?

thanks
Alan.

Original comment by i...@acs121.com on 31 Jul 2012 at 8:21

GoogleCodeExporter commented 9 years ago

Hi OrphanCat,

Understanding that you are probably busy I decided I should look further into 
the issue and have now worked a solution that does the job for my purpose. 
Would you please review and advise if you consider this suitable for inclusion 
into the next release version.

Am operating with Lazarus 0.9.30.4
FPC 2.6.0
on Win 7 64

Have also tested this in the Mac OSX version I am working on and it works with 
the exception that the string returns ??? for the three unicode definition 
characters. This will be looked at separately as a specific Mac OSX issue.

The Mac OSX development is going well, and is actually operating in a new 
application I am developing, but still needs lots of testing.

Have not tested this with Delphi, haven't had the time, but do not see that 
there should be any issue.

Changes to implement RAW DocumentSource fetch are:

Add new function definition after GetDocumentSource 

    function GetDocumentSource: ThtString;

    function GetDocumentSourceRAw: ThtString;

Add new property after DocumentSource

    property DocumentSource: ThtString read GetDocumentSource;

    property DocumentSourceRAW: ThtString read GetDocumentSourceRAW;

After GetDocumentSource insert new function GetDocumentSourceRAW as shown below

//-- BG ---------------------------------------------------------- 27.12.2010 --
function THtmlViewer.GetDocumentSource: ThtString;
var
  Pos: Integer;
begin
  if FDocument <> nil then
  begin
    Pos := FDocument.Position;
    FDocument.Position := 0;
    Result := FDocument.AsString;
    FDocument.Position := Pos;
  end
  else
    Result := '';
end;

//-- AEC ---------------------------------------------------------- 31.7.2012 --
function THtmlViewer.GetDocumentSourceRAW: ThtString;
var
  cp: integer;
begin
  if FDocument <> nil then
  begin
    cp := FDocument.CodePage;
    FDocument.CodePage:=0;
    Result := GetDocumentSource;
    FDocument.CodePage:=cp;
  end
  else
    Result := '';
end;

Original comment by i...@acs121.com on 31 Jul 2012 at 11:11

GoogleCodeExporter commented 9 years ago

Ignore above as have looked further and realised all I needed to do was change 
the code page in the HTMLViewer component to achieve exactly the same thing.

Original comment by i...@acs121.com on 4 Aug 2012 at 1:37

GoogleCodeExporter commented 9 years ago

Good news :)

I'm looking forward to the MAC adaption.

OrphanCat

Original comment by OrphanCat on 4 Aug 2012 at 3:07

Changed state: Invalid
Added labels: Type-Other
Removed labels: Type-Defect

kooloveme / thtmlviewer

UTF8 translation #175