leethomason / tinyxml2

TinyXML2 is a simple, small, efficient, C++ XML parser that can be easily integrated into other programs.
zlib License
5.06k stars 1.83k forks source link

Large XMLDocument: Print( XMLPrinter* ) throws std::bad_alloc, XMLDocument::SaveFile does not. #932

Open RL-S opened 1 year ago

RL-S commented 1 year ago

I have a large XML document (ca. 1.9GB on disk). Using XMLDocument::Print to write this document to a file always causes a std::bad_alloc to be thrown:

namespace xml = tinyxml2;
void toStream( const xml::XMLDocument* doc, std::ostream& os ){
    xml::XMLPrinter streamer;
    doc->Print( &streamer );
    os << streamer.CStr();
}

The exception was thrown before XMLDocument::Print returned.

On the other hand, using XMLDocument::SaveFile did not throw:

namespace xml = tinyxml2;
namespace fs = std::filesystem;
void toFile( xml::XMLDocument* doc, const fs::path& filePath ){
    xml::XMLError ret { doc->SaveFile( filePath.c_str() ) };
    if ( ret != xml::XML_SUCCESS ){
        // error handling here
    }
}

Maybe it would make sense to unify these functions.

cugone commented 1 year ago

Out of curiosity are you compiling for x86, i.e. 32-bit, targets? It's my understanding that processes run in 32-bit mode are only allowed 2 GB of memory. 1.9 GB is really close to that limit and if you were to have a lot of heap allocated memory elsewhere you can easily go over that limit and the program can't allocate any more.

Since XMLPrinter works entirely in-memory it would need to take up 1.9 GB of memory. Hence, the bad_alloc throw.

RL-S commented 1 year ago

Output of file command for my tinyxml2.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=..., not stripped

It's 64-bit. Same goes for the whole project, which definitely can use more than 2GB of memory.