Open zserge opened 9 years ago
I don't think unsigned int is an option, because we need a special value for "undefined", which is now -1
.
I wonder if there is any practical platform where this problem matters?
E.g. on most 32-bit systems int is 2^32, which should work fine for JSONs less than 2GB.
On smaller systems like AVRs int is 16 bit addressing up to 32KB strings, which is still a lot for a tiny 8-bit CPU with a few kilobytes of RAM.
You could make it an ssize_t
. It's signed, so -1 still works, but it has a much larger range on many systems.
I think you could also use pointers, rather than indexes (using NULL when you would've used -1), but this would break backwards compatibility, so I think ssize_t
is the best option.
First of all, this is an awesome little library, thanks Zserge. I also don't care much for non ascii strings, and it is obvious that this library wouldn't help with that, so that's really not a problem. But having a signed overflow problem is does not seem like an obvious thing while looking at the documentation, or the code. I would even say it is misleading. If the code can ever fail, that is a bug, period. According to Murphy's law of software development, if it can ever fail, it will fail. If it is ever going to be a security problem, and people see it was known and ignored - those people are going to be angry. Please, at least state it in documentation. By the way, I don't see a problem with using size_t instead of int. The special value can be SIZE_MAX instead of -1 -- not that SIZE_MAX definitely can not be a valid index in any array. The largest theoretical valid index is SIZE_MAX-1 . The current state of things is pretty ugly: size_t len argument used in json_parse, unsigned int pos used for indexing, signed int used for returning results. This is very incosistent, ugly. Please consider it, it would not take away any of the simplicity.
Also, joeljk13: ssize_t is a type in POSIX C, while size_t and SIZE_MAX are available in freestanding ISO C as well -- thus pretty much everywhere.
I just tried what happens, if json_pares is given a too-large string, an output from a test program:
token[0] start: 0 end: -2147482627 size: 2 token[1] start: 2 end: 6 size: 1 token[2] start: 9 end: -2147482646 size: 0 token[3] start: -2147482642 end: -2147482637 size: 1 token[4] start: -2147482634 end: -2147482629 size: 0
Now, as a simple hypothetical example, image a software, that just reads data from a network socket, tries to json_parse that data using jsmn. An attacker somehow manages to feed over 2 gigabytes of data, so the attacker can creates a token that starts at index -2 billion. While trying to access this token, the said software segfaults. I know, this is not something that is going to happen a lot, it's just the first thing I can come up with in 10 minutes.
Simple solution: Don't allow ridiculously large input?
If you are reading data from a socket and the client is giving you 2+Gb of data... that connection needs to close();
The maximum size of any input is limited by the maximum index we can hold in token.start/end +1 for adding this to the documentation. +1 for using unsigned and say (unsigned)-1 for undefined (solves the problem of negative index)
Having the size configurable is a plus, but even using size_t does not solve the problem; theoretically even size_t will eventually wrap.
On the other hand, since the input buffer is held in memory in it's entirety the system will run out of memory first.
Large strings produce weird results when the range of int is exceeded due to undefined behaviour regarding integer overflow.
Could you either change the type to unsigned int or make the type configurable (a la