Closed perfgao closed 1 year ago
When I use gumbo-parser to process html text whose encoding is UTF8 BOM, I find that the html text generated after parsing is disordered.
The contents of the html file such as:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta http-equiv="Cache-Control" content="no-transform" /> <title>test</title> <link rel="shortcut icon" href="favicon.ico" /> <script src="/js/jquery-1.4.2.min.js"></script> <script src="/js/url.js"></script> </head> <body> <div id="main"> <div id="nav_top"> <div id="nav_top_frame"> <a href="/guide.html" target="_blank" title="test" class="red f12"><b>test</b></a> <a href='/help.html' target='_blank' title='help' class='gray f12'>help</a> </div> </div> </div> </body> </html>
save with encding UTF-8 BOM.
when I use examples/serialize.cc:
$ ./serialize test.html
will get
<html xmlns="http://www.w3.org/1999/xhtml"> <head></head><body> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta http-equiv="Cache-Control" content="no-transform"/> <title>test</title> <link rel="shortcut icon" href="favicon.ico"/> <script src="/js/jquery-1.4.2.min.js"></script> <script src="/js/url.js"></script> <div id="main"> <div id="nav_top"> <div id="nav_top_frame"> <a href="/guide.html" target="_blank" title="test" class="red f12"><b>test</b></a> <a href='/help.html' target='_blank' title='help' class='gray f12'>help</a> </div> </div> </div> </body> </html>
When I use gumbo-parser to process html text whose encoding is UTF8 BOM, I find that the html text generated after parsing is disordered.
The contents of the html file such as:
save with encding UTF-8 BOM.
when I use examples/serialize.cc:
will get