tree-sitter / tree-sitter-html

HTML grammar for Tree-sitter
MIT License
136 stars 72 forks source link

Correctly cast `name_length` #9

Closed philipturnbull closed 5 years ago

philipturnbull commented 5 years ago

buffer[i++] is a signed char so this cast does not correctly handle negative values. If a custom tag has a length greater than 128 then buffer[i++] is negative and so the (uint16_t) cast will cast it to a large unsigned integer. This causes an out-of-bound read when reading the tag name.

We need to cast name_length to a uint8_t first, then widen to a uint16_t.

/cc @anglinb who helped triage this ✨

philipturnbull commented 5 years ago

I wonder if things would be easier if that buffer were declared as ‘unsigned char *’.

I think it would make things easier. I'm not sure how much effort it would take to change the ABI and update all the grammars though

maxbrunsfeld commented 5 years ago

I'm wondering; could we just independently change the declared type in this scanner (and potentially other scanners) without necessarily changing the library itself?

The runtime never dereferences these pointers itself: it just passes a pointer to serialize, memcpys the data into the syntax tree, and then later passes these pointers to deserialize. So I'd think it might be ok for the scanner to just interpret the pointer as an unsigned char * across the board, even if we don't change the core library.

@philipturnbull You probably have a better understanding than I do of the semantics of that sort of thing; is there anything undefined or implementation-defined about declaring a function's parameter as pointing to two differently-signed types in two different translation units?

philipturnbull commented 5 years ago

I think it's ~implementation defined~ undefined behaviour but, as the two types have the same underlying size, it should just work ™️