sabre-io / xml

sabre/xml is an XML library that you may not hate.
http://sabre.io/xml/
BSD 3-Clause "New" or "Revised" License
516 stars 77 forks source link

PHP warnings when large data #85

Closed cedtaz closed 8 years ago

cedtaz commented 8 years ago

Hello,

I have problems with the XML library when datas in event are important. Have you ever had its warning ?

PHP Warning·XMLWriter::text(): Memory allocation failed : buffer error: text too long
PHP Warning·XMLWriter::text(): Memory allocation failed : growing buffer
staabm commented 8 years ago

how big is "large data"? megabytes? gigabytes?

cedtaz commented 8 years ago
var_dump(strlen($value)); //1211794
$writer->text($value);

Sometimes, I don't have the errors... it depends perhaps servor memory ?

staabm commented 8 years ago

which php version do you use?

cedtaz commented 8 years ago

PHP 5.6

evert commented 8 years ago

This is very interesting. However, I think this bug should be reported at bugs.php.net, as the text function is just a part of XMLWriter.

You can report it here: http://bugs.php.net/

If you'd like, please leave a comment here with the php bug id, so we can keep an eye on it. As a workaround, you might want to try splitting up your $value variable and call text() multiple times.

GDmac commented 8 years ago

How are you writing @cedtaz ? can you include an example PHP snippet?

@evert looking at the write example

$service = new Sabre\Xml\Service();
echo $service->write('foo','bar');

Inside service->write, you use a hardcoded openMemory() on the XMLwriter, I would suggest adding another method that uses XMLwriter->openURI() to write to a stream.

The same goes for XMLreader. The parse method forces a string in parse() https://github.com/fruux/sabre-xml/blob/master/lib/Service.php#L109 I added/tested the following method to read from a file or url, Instead of XMLreader->xml(string), this uses XMLreader->open(uri) which will stream and accept very large files too.

    function parseUri($uri, $contextUri = null, &$rootElementName = null, $encoding = null, $options = 0){
        $r = $this->getReader();
        $r->contextUri = $contextUri;
        $r->open($uri, $encoding, $options);

        $result = $r->parse();
        $rootElementName = $result['name'];
        return $result['value'];
    }

(hattip for idea: https://coderwall.com/p/tqggjq/parsing-large-xml-files-using-php )

EDIT: While this may allow to read/write large data sets, memory usage might still be an issue, as the reader continues to read and put the array/structure in memory. An interesting idea might be something like a streamWriter and a streamReader and connect the two with a service to filter from in to out :-)

evert commented 8 years ago

Inside service->write, you use a hardcoded openMemory() on the XMLwriter, I would suggest adding another method that uses XMLwriter->openURI() to write to a stream.

openURI might help a bit, but it also depends a bit on the situation. Writing to php://memory will make little difference, to php://temp will write to disk, which may be good if that's what you're looking for.

But yea you are not prevented from doing this. Just call getWriter() on the service and initialize the writer yourself. The write() function is there to address the most common use-case.

Same goes a bit for the reader. I'd definitely be interested in a PR to add parseURI and writeURI though!

EDIT: While this may allow to read/write large data sets, memory usage might still be an issue, as the reader continues to read and put the array/structure in memory. An interesting idea might be something like a streamWriter and a streamReader and connect the two with a service to filter from in to out :-)

Also for these use-case you are definitely not restricted from doing this. If, say, you have a massive atom feed, you could use the reader manually, and only let the xml -> object mapper do its thing for every entry element in an atom feed. Regardless for reading you need to decide at one point which element you are ultimately 'streaming'. So for an atom feed this could be per-entry.

But yea I'm interested in suggestions otherwise.

evert commented 8 years ago

I'm closing this ticket because it's been a while since the last comment.

If this is indeed still an issue, feel free to comment here so we can continue discussing.