nlohmann / json

JSON for Modern C++
https://json.nlohmann.me
MIT License
42.01k stars 6.63k forks source link

const array_t::operator[] results in buffer overflow / segv on nullptr on out of bounds access #3973

Closed andrey-golubev closed 11 months ago

andrey-golubev commented 1 year ago

Description

operator[] called on a constant array_t object results in a heap-buffer-overflow (if non-empty array) or a segv on nullptr (if empty array), as reported by ASan, when accessing an element outside of the bounds.

Reproduction steps

    auto array = nlohmann::json::array();
    array.push_back(42);
    auto value = static_cast<const decltype(array)&>(array)[1];  // heap buffer overflow
    auto array = nlohmann::json::array();
    auto value = static_cast<const decltype(array)&>(array)[0];  // nullptr read

Expected vs. actual results

Expected: the code should likely assert in this situation. For instance, at() behaves neatly and reports an out-of-range exception. This might be too much to ask of operator[], but at least an assert is vital.

Actual: a heap buffer overflow: at best you have an address sanitizer which would immediately catch that, at worst it's a security vulnerability (?) -- and you would see an error later when e.g. attempting to convert a returned value into something (e.g. auto value = empty_const_array[0]; /* throws/asserts: */ value.get<int>();).

Bottomline: if you assert right away, this is a much safer behavior. Given JSON_ASSERT could be overwritten (e.g. to add some extra error information), the overall behavior is also much nicer in this case. And you don't get ASan to terminate your program immediately.

Minimal code example

auto array = nlohmann::json::array();
array.push_back(42);
auto value = static_cast<const decltype(array)&>(array)[1];  // heap buffer overflow

Error messages

==61129==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000340 at pc 0x55cd5083c37f bp 0x7ffc3732ff00 sp 0x7ffc3732fef0
READ of size 1 at 0x602000000340 thread T0
    #0 0x55cd5083c37e in nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > >::basic_json(nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&) nlohmann_json/include/nlohmann/json.hpp:1135

> Note: providing a limited stack trace due to privacy concers

Compiler and operating system

Ubuntu 22.04.1 LTS, c++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 (GCC)

Library version

v3.11.2

Validation

nlohmann commented 1 year ago

From the documentation:

If the element with key idx does not exist, the behavior is undefined.

Just as in the other STL containers, operator[] implements unchecked access. Why do you consider the current behavior as bug?

andrey-golubev commented 1 year ago

It is bug-prone to be honest, not a bug, since UB is not great. (I haven't figured whether there's a "suggestion" ticket so created a "bug report"). My main point is: this could be neatly wrapped in an assertion - which is a de-facto facility to do the "terminate if the contract is violated". In the case of assertion you have an explicit library-side failure instead of the UB.

(OTOH, one could argue that adding an assertion here would make people abuse it to streamline the validation)

The fact that STL doesn't assert is a horrible user experience to be fair (I mean, unlikely there's anyone in this world who would say "hey, it's okay, I want UB instead of an immediate failure"). Microsoft's standard library used to do all sorts of debug-build checks (e.g. sentinels, validating iterators, etc.) exactly to handle unintentional user errors, (it's a guess but) likely operator[] out-of-bounds access in vector/array are also assert-proofed because it's a rather popular user error.

Again, from my point of view, adding an explicit JSON_ASSERT() somewhere inside the operator[] makes the user experience much better. By default, it's an assert() call, so termination (yay) and if the user overwrites the macro, then it's a user error that they have to deal with.

By the way, I think heap-buffer-overflow could be considered a security vulnerability as well (though, it's tricky since you could consider it a user's fault), and being able to immediately abort is a decent middle-ground for all the parties.

andrey-golubev commented 1 year ago

Note that in my mental model:

In this particular case, I propose to enforce that the invariants are valid by adding an assertion.

nlohmann commented 1 year ago

I am hesitating to add these checks as they would slow down debug builds even further.

andrey-golubev commented 1 year ago

Fair enough. If the debug build speed is of concern, then it is somewhat troublesome to add this right away.

Generally speaking, debug build improvements is probably a separate topic.

P.S.: I'd argue whoever couldn't stand the slow down from the asserts, would likely disable them completely though.

nlohmann commented 1 year ago

I don't think the code should be changed. Is there anything that you would like seen added to the documentation?

andrey-golubev commented 1 year ago

The docs were fine as far as I recall, the thing is/was really about an assertion in the operator[] code path to greatly improve the usability.

kbarry-aurora commented 11 months ago

I agree that the semantics are inconsistent, and that eliminating existence/bounds checks can greatly improve readability if the code is already handling exceptions.

  1. If I do object.get<double>() but it's an object, I get an exception.

    const nlohmann::json object = { { "foo", "bar" } };
    object.get<double>();  // exception
  2. If I do object["foo"] but it's a non-object (e.g., array, number), I get an exception.

    const nlohmann::json object = 1.0;
    object["foo"];  // exception
  3. If I do object["foo"] but "foo" isn't a valid field, I get a crash that can't be handled.

    const nlohmann::json object = { { "bar", "baz" } };
    object["foo"];  // abort

This led to a latent bug in a system that we rolled out because we assumed that invalid operations always resulted in exceptions and not UB.

I understand the argument regarding undefined behavior, but using the STL for the underlying representation is an implementation detail that the library users shouldn't need to account for.


In any case, thanks for a great library!

nlohmann commented 11 months ago

If you want exceptions, use at().