Closed alexdowad closed 3 months ago
Maybe you could think of another of my C module lpath:
https://github.com/starwing/lpath/blob/cdc573e8bf3fabe67e67a41651de1b2da93f306c/lpath.c#L71
for the vector routines instead of raw malloc/free/realloc?
@starwing, I'm open to using those vector routines, but just one thing...
The code I am currently using in lua-utf8
normally uses a small, on-stack buffer for combining mark codepoints, and only switches to a dynamically allocated buffer if the on-stack one is too small. (The on-stack buffer can hold 8 codepoints, which is more than will realistically be needed for any natural language text. So basically 100% of the time, when lua-utf8
is processing natural language strings, it will never need to switch to a dynamically allocated buffer.)
Can that be done with the lpath
routines?
@alexdowad you mean the buffer in lua-protobuf? https://github.com/starwing/lua-protobuf/blob/3f9ffd485262edbcde618296c07b862ba5a74b38/pb.h#L170
I just think you could make a clear interface for the buffer you use, makes it separate from the normalize algorithm, just like the one in lpath/lua-protobuf.
The changes looks good for me. Maybe I will break the complicate algorithm into same functions to make it more clear when I have time, but It indeed looks good for now.
@alexdowad you mean the buffer in lua-protobuf? https://github.com/starwing/lua-protobuf/blob/3f9ffd485262edbcde618296c07b862ba5a74b38/pb.h#L170
I just think you could make a clear interface for the buffer you use, makes it separate from the normalize algorithm, just like the one in lpath/lua-protobuf.
@starwing, the buffer implementation in lua-protobuf would indeed work for this use case.
Rather than writing out the whole story of how I discovered these bugs, let me point you to the blog post which I wrote about it: https://alexdowad.github.io/dont-fuzz-uninstrumented-binary/
I would love to contribute some of these new test cases to the official test suite from the Unicode Consortium; we'll see if they welcome submissions.
FYA @starwing