htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.7k stars 415 forks source link

Segfault when calling tidyParseString with malformed input #1120

Open gabe-sherman opened 1 month ago

gabe-sherman commented 1 month ago

A segfault occurs in the below program when provided with malformed input. The segmentation fault occurs at line 625 in parser.c. This occurs when a node* type attempts to access its parent property, but the value is already NULL.

#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <stdlib.h>
#include <tidy.h>

int main(int argc, char *argv[])
{
    FILE *f = fopen(argv[1], "rb");
    fseek(f, 0, SEEK_END);
    long size = ftell(f);
    rewind(f);

    char *v0 = (char*)malloc((size_t)size+1);
    fread(v0, (size_t)size, 1, f);
    v0[size] = '\0';

   TidyDoc tdoc = tidyCreate();
   tidyParseString(tdoc, v0);

   return 0;
}

Test Environment

Ubuntu 22.04.4, 64 bit

How to trigger

./filename POC

Version

Latest: d08ddc2

POC File

https://github.com/gabe-sherman/bug-pocs/blob/main/tidy-html5/c1

ASAN Report

=================================================================
==182978==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x5555558432f2 bp 0x7fffffffce90 sp 0x7fffffffce30 T0)
==182978==The signal is caused by a READ memory access.
==182978==Hint: address points to the zero page.
    #0 0x5555558432f2 in InsertDocType /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:625:32
    #1 0x55555584a3bb in prvTidyParseHead /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:2709:13
    #2 0x55555582833f in ParseHTMLWithNode /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:1077:25
    #3 0x55555587deaa in prvTidyParseDocument /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:6341:9
    #4 0x5555557dd3ef in prvTidyDocParseStream /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1509:9
    #5 0x5555557d5ab5 in tidyDocParseString /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1220:18
    #6 0x5555557d573c in tidyParseString /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1117:12
    #7 0x5555557cd4a1 in main /home/gabriel/fuzzing-trials/tidy-html/crashes/c1/reproducer.c:24:4
    #8 0x7ffff765fd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #9 0x7ffff765fe3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #10 0x5555556f43d4 in _start (/home/gabriel/fuzzing-trials/tidy-html/crashes/c1/c1.out+0x1a03d4) (BuildId: 0f6509d2d013898defc26ab226c81186debc92c4)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:625:32 in InsertDocType
==182978==ABORTING
make: *** [Makefile:30: crash] Error 1
gabe-sherman commented 1 month ago

Update on this: I did a bit of digging to identify the root cause of this crash. The root of this comes from setting the value of an html node to the return type of the function call InferredTag in ParseDocument. The returned node from this InferredTag call has a NULL parent. I’ve seen this at lines 6316 and 6352. This value is then propagated into ParseHTMLWithNode, where it’s again propagated into its corresponding parser function. These parsers then pass this value into various functions where checks are not made to the parent values before they are accessed. I have seen this seg fault occur at line 625 in parser.c from ParseHead calling InsertDocType and at line 143 in parser.c from ParseInline calling InsertNodeAsParent. I don’t have enough knowledge of the API to recommend potential fixes, but I did notice that the function ParseNamespace avoids this seg fault through an assertion statement at line 4120 in parser.c. I hope this helps, thanks!