nlohmann / json

JSON for Modern C++
https://json.nlohmann.me
MIT License
42.38k stars 6.67k forks source link

In highly nested functions, passing json into a function leads to a segmentation fault/bus error #4186

Closed DUOLabs333 closed 11 months ago

DUOLabs333 commented 11 months ago

Description

Exactly what it says in the title --- giving more stack space does not solve the problem. However, if I use references or pointers, the stack could grow more before crashing (but not by much --- roughly a 100 more iterations).

Reproduction steps

Reproducer:

#include <nlohmann/json.hpp>

using json=nlohmann::json; 

void print_json(json name,int n){
    if (n<=0){
        return;
    }
    printf("%s\n",name.dump(-1).c_str());
    print_json(name,n-1);
}

int main(int argc, char** argv){
    std::string testing = R"({"glossary":{"title":"example glossary","GlossDiv":{"title":"S","GlossList":{"GlossEntry":{"ID":"SGML","SortAs":"SGML","GlossTerm":"Standard Generalized Markup Language","Acronym":"SGML","Abbrev":"ISO 8879:1986","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"GlossSee":"markup"}}}}})";
    print_json(json::parse(testing),100000);

    return 0;
}

This gives a segmentation fault, but the code I'm using (which is also recursive) gives me a bus error.

Expected vs. actual results

Expected: To be able to run all iterations without crashing.

Minimal code example

No response

Error messages

No response

Compiler and operating system

clang, MacOS 11

Library version

HEAD

Validation

gregmarr commented 11 months ago

Even without using this library, I would not expect to be able to have a function that recurses 100,000 times, unless I have a massively huge stack. If you optimize this, you might get away with it due to tail call optimization converting it from recursive to iterative.

DUOLabs333 commented 11 months ago

If I remove the json argument, then it works as expected --- I can finish the entire execution. In fact, as long as I don't use json in the function, then it works. This means that something is happening when json is being used that causes this (this happens even if I'm just extracting a value from the json, like std::string a=name["glossary"]["title"];.

nlohmann commented 11 months ago

You pass by value - this means you create thousands of copies of the json value. Can you try to pass by reference?

DUOLabs333 commented 11 months ago

I already tried that --- same problem, just happens later.

nlohmann commented 11 months ago

I see the same behavior with

#include <string>

void print_json(std::string name,int n){
    if (n<=0){
        return;
    }
    printf("%s\n",name.c_str());
    print_json(name, n-1);
}

int main(int argc, char** argv){
    std::string testing = R"({"glossary":{"title":"example glossary","GlossDiv":{"title":"S","GlossList":{"GlossEntry":{"ID":"SGML","SortAs":"SGML","GlossTerm":"Standard Generalized Markup Language","Acronym":"SGML","Abbrev":"ISO 8879:1986","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"GlossSee":"markup"}}}}})";
    print_json(testing,100000);

    return 0;
}

As @gregmarr mentioned, you are testing against the stack limits, not against the library.

DUOLabs333 commented 11 months ago

That's weird --- If I do that (and use references), the program completes just fine, it's just that references with json is not enough to complete the program.

gregmarr commented 11 months ago

By adding a local variable with a destructor, you are defeating the tail call optimization.

void print_json(std::string const &name,int n){
    if (n<=0){
        return;
    }
    std::string copy = name; // If you comment this out, it will succeed
    print_json(name,n-1);
}

int main(int argc, char** argv){
    std::string testing = R"({"glossary":{"title":"example glossary","GlossDiv":{"title":"S","GlossList":{"GlossEntry":{"ID":"SGML","SortAs":"SGML","GlossTerm":"Standard Generalized Markup Language","Acronym":"SGML","Abbrev":"ISO 8879:1986","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"GlossSee":"markup"}}}}})";
    print_json(testing,100000);

    return 0;
}
DUOLabs333 commented 11 months ago

I understand that, but replacing json reference with a std::string reference causes the program to terminate (in both cases, there are no local variables).

DUOLabs333 commented 11 months ago

However, using -O3 causes this reproducer to succeed. Unfortunately, my actual function is not tail recursive, so the compiler can't optimize for it.

gregmarr commented 11 months ago

The local variable is here, it's a temporary, but it does appear to be affecting the optimization:

printf("%s\n",name.dump(-1).c_str());
              ^^^^^^^^^^^^^
DUOLabs333 commented 11 months ago

Ah, that's why std::string::c_str works --- it's a pointer, which doesn't need to be deallocated.

Since my function is not tail-recursive, it doesn't seem to apply. However, in my function, I'm not dealing with 10000 iterations, but at most 15, so there's something else going on (I also get a bus error). Given that references alleviate some of the problem tells me that something probably went awry. But I don't think this is due to the library at this point, so I'll close this issue.

DUOLabs333 commented 11 months ago

A problem though: when I debug applications, I need to get the json string so I can externally parse it --- however, as you can see, if I'm in heavily nested frame, and I try to dump() a large json object, I get a stack overflow. Is there a way to get the raw char* directly, to avoid making a local variable?

gregmarr commented 11 months ago

There is no string form of a large object until you generate it using dump(), and that form is not stored, only returned to you.

DUOLabs333 commented 11 months ago

When I call test["a"]["b"]["c"] on a json& test, are new objects created/copied for each of the keys? If so, that could explain the overflow.

gregmarr commented 11 months ago

test["a"] returns a reference, which you then access by calling ["b"], which returns a reference, which you then access by calling ["c"], which returns a reference. This reference is either const or non-const depending on the const-ness of test. If test is non-const, then it can create new sub-objects, but that's all heap-based. The references may or may not be stack objects, or may just be stored in registers, depending on the optimization, but if they are stack objects, then they are pointer/reference sized.