omnetpp / omnetpp

OMNeT++ Discrete Event Simulator
https://omnetpp.org
598 stars 151 forks source link

memory leak by using libxml on large xml files #121

Closed rhornig closed 5 years ago

rhornig commented 5 years ago

Reported by stoehr on 17 Jan 2010 22:46

Version: 4.0

I use the OmnetPP API (cEnvir::getXMLDocument()) to parse a XML files with libxml library. Each Time i call this function a given file is parsed via "xmlDocPtr doc = xmlCtxtReadFile(ctxt, filename, NULL, options);" in line 162 in saxparser_libxml.cc. Is the size of given xml file more than 1 MB, memory usage of OmnetPP rises up to 300 MB and more. This works so far, but as soon as xmlFreeDoc(doc); in line 195 in the same file is called, the memory should be given back to the system and the problem is it doesn't!

Mantis Bugtracker #131

avarga commented 5 years ago

Comment posted by andras on 22 Jan 2010 12:06

Additional info from submitter: " I made some experiments with the same implementation like in saxparser_libxml.cc and tested 2 cases. Case 1: Parse file and immediatly after that call the "xmlFreeDoc(doc)" function Works great.

Case 2: Parse file, work in any way with the created nodes and then call "xmlFreeDoc(doc)" Allocated memory is not given back to the system. "

Looks like this is a libxml2 issue (?)

BTW, OMNeT++ supports the expat parser as well, although expat has less features (no validation last time I checked). So switching to expat in your project may be an option. It is also a lightweight library (much smaller than xerces).

rhornig commented 5 years ago

Comment posted by baumgart on 22 Apr 2010 16:30

With libexpat memory usage is a lot lower, but still much larger than expected. A quick valgrind massif run showed the following results:

->38.40% (74,339,788B) 0x4460434: std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13) | ->20.20% (39,109,580B) 0x446269E: std::string::_M_mutate(unsigned int, unsigned int, unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | ->20.20% (39,109,580B) 0x4462838: std::string::_M_replace_safe(unsigned int, unsigned int, char const, unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | | ->20.20% (39,109,580B) 0x44628DC: std::string::assign(char const, unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | | ->12.16% (23,543,118B) 0x431A99B: cXMLElement::cXMLElement(char const, char const, cXMLElement) (basic_string.h:970) | | | | ->12.16% (23,543,118B) 0x41C08A0: cXMLSAXHandler::startElement(char const, char const*) (cxmldoccache.cc:95) | | | | ->12.16% (23,543,118B) 0x49BE7BD: ??? (in /lib/libexpat.so.1.5.2) | | | | ->12.16% (23,543,118B) 0x49BF92F: ??? (in /lib/libexpat.so.1.5.2) | | | | ->12.16% (23,535,849B) 0x49B86DA: XML_ParseBuffer (in /lib/libexpat.so.1.5.2) | | | | | ->12.16% (23,535,849B) 0x49B9B43: XML_Parse (in /lib/libexpat.so.1.5.2) | | | | | ->12.16% (23,535,849B) 0x496E5C4: SAXParser::parse(char const) (saxparser_expat.cc:125) | | | | | ->12.16% (23,535,849B) 0x41C0708: cXMLDocCache::parseDocument(char const) (cxmldoccache.cc:158) [...] | ->09.41% (18,219,500B) 0x44610A6: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | ->09.41% (18,219,500B) 0x4461EDB: std::string::reserve(unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | ->09.38% (18,147,731B) 0x4462149: std::string::append(char const, unsigned int) (in /usr/lib/libstdc++.so.6.0.13) | | | ->05.67% (10,982,808B) 0x49BEC12: ??? (in /lib/libexpat.so.1.5.2) | | | | ->05.67% (10,982,808B) 0x49BF92F: ??? (in /lib/libexpat.so.1.5.2) | | | | ->05.67% (10,979,010B) 0x49B86DA: XML_ParseBuffer (in /lib/libexpat.so.1.5.2) | | | | | ->05.67% (10,979,010B) 0x49B9B43: XML_Parse (in /lib/libexpat.so.1.5.2) | | | | | ->05.67% (10,979,010B) 0x496E5C4: SAXParser::parse(char const) (saxparser_expat.cc:125) | | | | | ->05.67% (10,979,010B) 0x41C0708: cXMLDocCache::parseDocument(char const) (cxmldoccache.cc:158) [...] | ->08.79% (17,010,708B) 0x446133F: ??? (in /usr/lib/libstdc++.so.6.0.13) | | ->08.79% (17,009,539B) 0x4461524: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const, std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13) | | | ->04.42% (8,554,797B) 0x431BB30: cXMLElement::setAttribute(char const, char const) (cxmlelement.cc:99) | | | | ->04.42% (8,554,797B) 0x41C08C1: cXMLSAXHandler::startElement(char const, char const*) (cxmldoccache.cc:100) | | | | ->04.42% (8,554,797B) 0x49BE7BD: ??? (in /lib/libexpat.so.1.5.2) | | | | ->04.42% (8,554,797B) 0x49BF92F: ??? (in /lib/libexpat.so.1.5.2) | | | | ->04.42% (8,552,001B) 0x49B86DA: XML_ParseBuffer (in /lib/libexpat.so.1.5.2) | | | | | ->04.42% (8,552,001B) 0x49B9B43: XML_Parse (in /lib/libexpat.so.1.5.2) | | | | | ->04.42% (8,552,001B) 0x496E5C4: SAXParser::parse(char const) (saxparser_expat.cc:125) | | | | | ->04.42% (8,552,001B) 0x41C0708: cXMLDocCache::parseDocument(char const*) (cxmldoccache.cc:158)

So a lot of std::strings get allocated by libexpat, if you call getXMLDocument(), but are not freed afterwards...

rhornig commented 5 years ago

Comment posted by baumgart on 10 May 2010 15:33

The problem seems to be, that libxml and libexpat allocate and free a large number of small memory chunks. Since each individual chunk is very small, the memory is not returned to the operating system to achieve better performance.

Calling malloc_trim(0) after freeing the xml document tree solves the problem.

BTW, libxml2 seems to need more than twice as much memory for the same XML document than libexpat!

rhornig commented 5 years ago

Comment posted by baumgart on 17 May 2010 14:47

One issue remains: Currently there is no function to remove a xml document from OMNeT++'s internal document cache. You can free the parsed document by deleting the root node, but if you parse the same document again, you'll get a segmentation fault (since OMNeT++ thinks, it's still in the cache).

Solution: Provide a public method to remove a document from the cache.

avarga commented 5 years ago

Comment posted by andras on 22 May 2010 17:30

Call to malloc_trim(0) postponed to a later release (4.2)

For 4.1, added forgetXMLDocument(filename) and flushXMLDocumentCache() to cEnvir (i.e. can be called as ev.flushXMLDocumentCache()). Currently one must be very careful, because all cPars (actually cXMLParImpl) hold cXMLElement pointers obtained from ev.getXMLDocument(), so after a call to ev.flushXMLDocumentCache() all accesses of module XML parameters will crash.

For 4.2, cPar will have to be modified so that it only stores the reference (filename+XPath), and asks ev.getXMLDocument() every time. This way, if the cache has been flushed, the document can be transparently reloaded when needed. (I did not want to actually implement this for 4.1rc1, as it is too late for a change that may potentially crash simulation models.) I added a note to cPar's xmlValue() method that the lifetime of the returned objects may be limited to the current event (or initialize()) -- this gives the freedom to OMNeT++ to flush the document cache whenever it wants to. One potential policy that seems to make sense is to flush the cache once after the initialization phase, and leave further flush() calls to model authors if they want to.

Meanwhile, modules should be updated so that they do NOT hang on to cXMLElement pointers.

rhornig commented 5 years ago

OMNeT++ 6.0 now uses it's own built-in sax parser as long as DTD validation is not needed. It is much more memory efficient than the libXML parser.