x-stream / xstream

Serialize Java objects to XML and back again.
http://x-stream.github.io
Other
749 stars 227 forks source link

DomReader: slow performance $O(n^2)$ when deserialize on millions of elements to list #342

Closed morris821028 closed 1 year ago

morris821028 commented 1 year ago

Target Object

LinkedList<String> data = new LinkedList<>();
for (int i = 0; i < 1000000; i++)
  data.add("elem");

Source Code

// Element container; // using DocumentBuilder to append children.

new XStream(new DomDriver()).unmarshal(new DomReader(container));

The parser will achieve $O(n^2)$ time when switching context.

https://github.com/x-stream/xstream/blob/master/xstream/src/java/com/thoughtworks/xstream/io/xml/DomReader.java#L135

Because the parser will move down and up, the DomReader.reassignCurrentElement in linear time $O(n)$. For each element, this function will be called in $O(n)$. Finally, complete in $O(n^2)$.

Then, I'm not sure the implementation of Element.item(int). In my understanding, some of implementations used linked list and make it worse in $O(n^2)$ in one DomReader.reassignCurrentElement.

Here is my workaround for your reference.

public class DomReader extends AbstractDocumentReader {
...
        private List<Element> childElements;
        private Map<Object, List<Element>> cache;
...
        @Override
        protected void reassignCurrentElement(Object current) {
            currentElement = (Element) current;
            if (cache == null)
                cache = new HashMap<>();
            childElements = cache.get(current);
            if (childElements != null)
                return;

            childElements = new ArrayList<>();
            Node node = currentElement.getFirstChild();
            while (node != null) {
                if (node instanceof Element) {
                    childElements.add((Element) node);
                }
                node = node.getNextSibling();
            }
            cache.put(current, childElements);
        }

        @Override
        public void moveUp() {
            cache.remove(getCurrent());
            super.moveUp();
        }
...
joehni commented 1 year ago

Actually the code was written 19 years ago (predates my time with XStream), but I confess the current implementation is unfortunate for elements with a lot of children.

kdebski85 commented 1 year ago

This can be considered as a security issue. An attacker can send XML with many elements to perform DOS attack on a server.

joehni commented 1 year ago

This can be considered as a security issue. An attacker can send XML with many elements to perform DOS attack on a server.

No, but you're free to prove it. DOM is a memory-based model. You'll never get an XML structure loaded with so many elements to slow down this code significantly enough for a DOS attack. You'll run into an OOME first.