Open Jean-Daniel opened 5 years ago
Hi @Jean-Daniel In a single mode, tokens will always be equal and the program will not enter the loop. Do you have an example html where the program in a single mode enter to this loop?
I saw and corrected another problem. Please, try code from master.
Thanks for the report!
Sorry, I didn't gave you enough info. I'm actually using the parser to extract some data from html fragments (I only have the
content), and I don't really need a full tree. So I'm using the 'after token done' callback, and disable the tree by usingMyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE
.
A quick test reveal that this is the later flag that trigger the bug. Without it, the parser works flawlessly, but when I set this flag, it crashes on CDATA.
#import <myhtml/api.h>
int main(int argc, char **argv) {
const char *bytes = "<div><![CDATA[ foo ]]></div>";
size_t length = strlen(bytes);
myhtml_t* myhtml = myhtml_create();
myhtml_init(myhtml, MyHTML_OPTIONS_PARSE_MODE_SINGLE, 1, 0);
myhtml_tree_t* tree = myhtml_tree_create();
myhtml_tree_init(tree, myhtml);
myhtml_tree_parse_flags_set(tree, MyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE | MyHTML_TREE_PARSE_FLAGS_SKIP_WHITESPACE_TOKEN);
// parse html (we only have the body)
myhtml_parse_fragment(tree, MyENCODING_UTF_8, bytes, length, MyHTML_TAG_BODY, MyHTML_NAMESPACE_HTML);
myhtml_tree_destroy(tree);
myhtml_destroy(myhtml);
return 0;
}
When using the parser in MyHTML_OPTIONS_PARSE_MODE_SINGLE mode, it is initialized in myhtml_init like this:
As this call specify that is need 0 stream, the
myhtml->thread_stream
is initialized to NULL.But then, when parsing CDATA (in
myhtml_tokenizer_state_markup_declaration_open()
), the parser try to callmyhtml_tree_wait_for_last_done_token()
, which try to access unconditionallytree->myhtml->thread_stream->timespec
and obviously it crashes (thread_stream
is NULL).Backtrace: