htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

bugfix for messageobj.c for windows vc++ #800

Closed sonyps5201314 closed 3 years ago

sonyps5201314 commented 5 years ago
 src/messageobj.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/messageobj.c b/src/messageobj.c
index a6a0bef..10c3c1c 100644
--- a/src/messageobj.c
+++ b/src/messageobj.c
@@ -643,6 +643,7 @@ static struct printfArg* BuildArgArray( TidyDocImpl *doc, ctmbstr fmt, va_list a
         else
         {
             strncpy(nas[cn].format, fmt + nas[cn].formatStart, nas[cn].formatLength);
+           nas[cn].format[nas[cn].formatLength] = 0;
         }
geoffmcl commented 5 years ago

@sonyps5201314 thank you for the issue, through a suggested patch fix no less...

After testing, and some debugging, I agree, but the issue is more something like messageobj API can return a non null terminated string!, or something... more descriptive... but a bug... but you have found it...

Research

Yes, it does look like the MSVC docs strncpy, which documents, in part, If count is less than or equal to the length of strSource, a null character is not appended ...... AWK!

And in unix docs, like strncpy, says, in part If there is no null byte among the first n bytes of src ..., no null termination... AWK!

Yikes, unix and windows docs agree... sort of... ;=))

Test Case

Any clean html... any tidy version, probably since the messageobj first appeared...

Now, one of the most commonly seen outputs is Info: Document content looks like %s... in this case fmt + nas[cn].formatStart sets source as %s, length 2, being near end of the string...

And the count, nas[cn].formatLength, would also be 2...

Ipso facto, only %s would be copied to the destination, nas[cn].format... no null added ...

This dst (fixed buf len 21) can be seen in MSVC as %s!!!!..., where the ! character represents the debug fill done... D.AWK!

The Fix

Applying your patch, and adding a comment, suggest -

+            nas[cn].format[nas[cn].formatLength] = 0; /* Is. #800 - If count <= srcLen, no 0 added! */

Note, nas[cn].formatLength count has already been checked with if ( nas[cn].formatLength >= FORMAT_LENGTH ), so no possibility of buffer overrun...

While this copied message format is not presently later used in libTidy, so maybe this bug has remained quietly hidden...

However, its contents are available, to libTidy users, as a C-string, a tidy ctmbstr, through API tidyGetArgFormat, so must be null termainated...

Thank you for finding this... are you using this API in your project?

Will apply this patch, baring any negative comments, soonest... remind me if I am too long... thanks...

sonyps5201314 commented 5 years ago

I am not use this API in my project,I debug tidy.exe with some test html find this problem. I use tidy to fix web html to xhtml, then use tinyxml2 to get the information from the formated xhtml.

geoffmcl commented 3 years ago

@sonyps5201314, eventually got around to pushing this small fix - years later ;=))

Again thanks for spotting the issue...