Closed alexdowad closed 1 year ago
Added docs for new functions.
Just tried fuzzing is_valid
with Clang+libFuzzer+ASAN+UBSAN for crashes. I only ran the fuzzer for less than a minute, but within that time it tried around 40,000,000 test strings without finding any crashes.
Tried fuzzing clean
for about a minute; ~70,000,000 test inputs with no crashes found.
Thanks for contribution!
maybe we could think twice about the name "is_valid"? the Lua style do not use underscore, maybe "isvalid" better?
or, maybe a "utf8.invalidposition" is better? if the utf8 is valid, the function returns nil, otherwise it returns the position of the first ocurrs of the invalid bytes.
Thanks for contribution!
maybe we could think twice about the name "is_valid"? the Lua style do not use underscore, maybe "isvalid" better?
Thanks for the suggestion. I consulted this page when choosing the name: http://lua-users.org/wiki/LuaStyleGuide
Booleans - It can be helpful to prefix Boolean values or functions used as predicates with
is
, such asis_directory
rather thandirectory
(which might store a directory object itself).
Here is an example from the standard library, where is
is used as a prefix, without an underscore: https://www.lua.org/manual/5.3/manual.html#pdf-coroutine.isyieldable
I agree with removing the underscore. Would that be satisfactory?
or, maybe a "utf8.invalidposition" is better? if the utf8 is valid, the function returns nil, otherwise it returns the position of the first ocurrs of the invalid bytes.
I think that is also a useful API function to provide, but I think it should be separate.
@starwing See the code I have just pushed. I've included docs and unit tests for invalidoffset
, but haven't fuzzed it.
Fuzzing has just uncovered an interesting issue... the output of clean
is not always valid UTF-8... if the replacement
string itself is not valid UTF-8!
I don't know if that's something we should do something about... or if it is just up to the user to make sure they pass a valid replacement string.
I am thinking that clean
should check if the replacement
string is valid UTF-8 and raise an error if not.
The replacement string is likely to be very short, so checking it should have only a tiny performance cost.
I am thinking that
clean
should check if thereplacement
string is valid UTF-8 and raise an error if not.The replacement string is likely to be very short, so checking it should have only a tiny performance cost.
I'm also thinking that check the replacement
string is better.
OK, I've implemented that.
Merged, Thanks!
Thank you as well!
I added more assertions to my fuzzing code and ran the fuzzer a bit more, but didn't find any other problems.
Are you interested in committing the fuzzing code to the main repo, in case it is useful for others?
Oh, and the next question... when are you likely to cut a new release, so I can start using these new functions?
Are you interested in committing the fuzzing code to the main repo, in case it is useful for others?
That's good, you could open another PR for the fuzzing test. Thanks!
I'll make a new release asap.
As discussed in #40.
TODO: Fuzzing. TODO: Amend docs.