hramenko / thtmlviewer

Automatically exported from code.google.com/p/thtmlviewer
Other
0 stars 0 forks source link

Encoded Characters Breaking Parser with Meta Content-Type #377

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Which steps will reproduce the problem?

1. Load the attached test.html

What is the expected output? What do you see instead?

HTMLViewer only shows content upto encoded characters. Should see all the 
content.

The issue occurs because the HTML contains <meta http-equiv="Content-Type" 
content="text/html; charset=utf-8"> in the header (forcing UTF-8). If the meta 
tag is removed the HTMLViewer shows the content fine. If the meta tag remains 
the HTMLViewer shows the content truncated.

Which version of the product are you using? Which compiler version are you 
using? On which operating system?

Latest HTMLViewer. Delphi 7. Windows XP

Please attach test html files and screenshots, if appropriate.
Please provide any additional information:

Original issue reported on code.google.com by winal...@googlemail.com on 3 Sep 2014 at 3:34

Attachments:

GoogleCodeExporter commented 9 years ago
OC, FYI as an update, I also find calling .DocumentSource breaks on the special 
characters too.

Original comment by winal...@googlemail.com on 3 Sep 2014 at 4:28

GoogleCodeExporter commented 9 years ago
One more update;

It fails because function TBuffConvUTF8.NextChar: TBuffChar; sets the char to 
invalid which also equals EOF and stops char parsing. Setting the TBuffChar to 
anythin other than 0 (say 63) allows processing to continue.

Original comment by winal...@googlemail.com on 3 Sep 2014 at 6:28

GoogleCodeExporter commented 9 years ago
Thanks for spotting this issue.

r485 fixes it.

Original comment by OrphanCat on 14 Oct 2014 at 3:14