wyona / yanel

http://www.yanel.org
Apache License 2.0
10 stars 5 forks source link

Implement Templates for doing XSLT transformations #67

Open michaelwechner opened 10 years ago

michaelwechner commented 10 years ago

The class

src/impl/java/org/wyona/yanel/impl/resources/BasicXMLResource.java

is currently re-parsing XSLTs with every request. In order to improve performance it probably would make sense to use javax.xml.transform.Templates

See for example

http://www.javaworld.com/article/2073394/java-xml/transparently-cache-xsl-transformations-with-jaxp.html

baszero commented 10 years ago

Thanks for creating this issue. The discussed approach with the template cache sounds interesting and would make fully sense.

Should this approach find its way into Yanel, it would make sense to also implement a control flag that one can override for its realm, e.g. xsltMode = [default, templates]

so that if "default" is set, the algorithm is used as today, if "templates" is set, the new algorithm is used (the flag would be read at startup of the realm).

baszero commented 10 years ago

This is a short update on how I introduced XSL Template objects (and caching) into my realm. Performance has been improved dramatically in our case, up to 10 times faster than the current Yanel implementation.

BasicXMLResource.getTransformedInputStream()

I replaced this line
xsltHandlers[i] = tf.newTransformerHandler(source);

by
xsltHandlers[i] = getTransformerHandler(source, tf);

In the same class I added this protected method:

protected TransformerHandler getTransformerHandler(Source source, SAXTransformerFactory tf) throws TransformerConfigurationException {
    return tf.newTransformerHandler(source);
}

So the change above actually does not change anything at all, except that now it is possible to overwrite the getTransformerHandler() method.

A new class extending BasicXmlResource with caching capabilities As a next step I created a new class that extends BasicXmlResource and which overrides the method above. The class as such does not do anything else than overriding the BasicXmlResource's method from above:

@Override
protected TransformerHandler getTransformerHandler(Source source, SAXTransformerFactory tf) {
    TransformerHandler th = null;
    String sourceId = null;

    // Caching ?
    boolean useCaching = Boolean.FALSE;
    try {
        useCaching = retrieveFlagFromWhereYouWant();
    } catch (Exception e) {
        log.error(e,e);
    }

    try {
        sourceId = source.getSystemId();
        if (!useCaching) {
            th = tf.newTransformerHandler(source); // the normal way
        } else {
            // We use the cached templates
            Templates template = XslTemplatesCache.get(source.getSystemId());
            if (template == null) {
                Templates newTemplate = tf.newTemplates(source);
                XslTemplatesCache.put(source.getSystemId(), newTemplate);
                template = newTemplate;
            }
            th = tf.newTransformerHandler(template);
        }

    } catch (Exception e) {
        log.error(e,e);
    }
    return th;
}

As you can see, if caching is enabled (you have to implement it at your own, on resource level or globally), it uses the Templates.

The Templates Cache Class And here is the class implementing the cache. Please note that it uses the java.util.concurrent.locks package, in particular the ReentrantReadWriteLock . This lock guarantees the following:

The usage of ReentrantReadWriteLock is crucial for a high-performing cache like this!!

package com.zwischengas.jaxp;

import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

import javax.xml.transform.Templates;

import org.apache.log4j.Logger;

/**
 * This cache uses the ReentrantReadWriteLock from the java.util.concurrent.locks package instead of using synchronized code sections. 
 * Reason to use ReentrantReadWriteLock : we expect many concurrent Read Threads and only very few write threads. 
 * So we want to allow multiple read threads in parallel in order to improve performance.
 * 
 * @author baszero
 */
public class XslTemplatesCache {
    protected static Logger log = Logger.getLogger(XslTemplatesCache.class);

    private static volatile Map<String, Templates> templatesCache = new HashMap<String, Templates>();
    private static final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(Boolean.TRUE); // true = fair policy, order of lock acquisition is preserved
    private static final Lock r = rwl.readLock();
    private static final Lock w = rwl.writeLock();

    public static Templates get(String key) {
        r.lock(); // here it only waits if another thread is writing to the cache
        try {
            return templatesCache.get(key);

        } finally {
            r.unlock();
        }
    }

    public static Templates put(String key, Templates value) {
        w.lock(); // this thread waits until all preceding read threads have finished
        try {
            return templatesCache.put(key, value);

        } finally {
            w.unlock();
        }
    }

    public static void clear() {
        w.lock();
        try {
            templatesCache.clear();
        } finally {
            w.unlock();
        }
    }

    /**
     * @return number of entries
     */
    public static int size() {
        int result = -1;
        r.lock();
        try {
            result = templatesCache.size();
        } finally {
            r.unlock();
        }
        return result;
    }

    public static Set<String> getKeys() {
        Set<String> result = null;
        r.lock();
        try {
            result = templatesCache.keySet();
        } finally {
            r.unlock();
        }
        return result;
    }

}

Final remarks

If you use the cached templates as described above, you will get the following behaviour:

michaelwechner commented 10 years ago

Dear Balz

Great :-), thanks very much. Will try to integrate it shortly.

All the best

Michael

Am 10.04.14 15:34, schrieb baszero:

This is a short update on how I introduced XSL Template objects (and caching) into my realm. Performance has been improved dramatically in our case, up to 10 times faster than the current Yanel implementation.

BasicXMLResource.getTransformedInputStream()

I replaced this line
xsltHandlers[i] = tf.newTransformerHandler(source);

by
xsltHandlers[i] = getTransformerHandler(source, tf);

In the same class I added this protected method:

protected TransformerHandler getTransformerHandler(Source source, SAXTransformerFactory tf) throws TransformerConfigurationException {
    return tf.newTransformerHandler(source);
}

So the change above actually does not change anything at all, except that now it is possible to overwrite the getTransformerHandler() method.

A new class extending BasicXmlResource with caching capabilities As a next step I created a new class that extends BasicXmlResource and which overrides the method above. The class as such does not do anything else than overriding the BasicXmlResource's method from above:

@Override
protected TransformerHandler getTransformerHandler(Source source, SAXTransformerFactory tf) {
    TransformerHandler th = null;
    String sourceId = null;

    // Caching ?
    boolean useCaching = Boolean.FALSE;
    try {
        useCaching = retrieveFlagFromWhereYouWant();
    } catch (Exception e) {
        log.error(e,e);
    }

    try {
        sourceId = source.getSystemId();
        if (!useCaching) {
            th = tf.newTransformerHandler(source); // the normal way
        } else {
            // We use the cached templates
            Templates template = XslTemplatesCache.get(source.getSystemId());
            if (template == null) {
                Templates newTemplate = tf.newTemplates(source);
                XslTemplatesCache.put(source.getSystemId(), newTemplate);
                template = newTemplate;
            }
            th = tf.newTransformerHandler(template);
        }

    } catch (Exception e) {
        log.error(e,e);
    }
    return th;
}

As you can see, if caching is enabled (you have to implement it at your own, on resource level or globally), it uses the Templates.

The Templates Cache Class And here is the class implementing the cache. Please note that it uses the java.util.concurrent.locks package, in particular the ReentrantReadWriteLock . This lock guarantees the following:

  • Multiple readers can read from the cache simultaneously if there is no writer thread
  • A writer can only write to the cache if all reading threads have got their value from the cache.

The usage of ReentrantReadWriteLock is crucial for a high-performing cache like this!!

package com.zwischengas.jaxp;

import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

import javax.xml.transform.Templates;

import org.apache.log4j.Logger;

/**
 * This cache uses the ReentrantReadWriteLock from the java.util.concurrent.locks package instead of using synchronized code sections. 
 * Reason to use ReentrantReadWriteLock : we expect many concurrent Read Threads and only very few write threads. 
 * So we want to allow multiple read threads in parallel in order to improve performance.
 * 
 * @author baszero
 */
public class XslTemplatesCache {
    protected static Logger log = Logger.getLogger(XslTemplatesCache.class);

    private static volatile Map<String, Templates> templatesCache = new HashMap<String, Templates>();
    private static final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(Boolean.TRUE); // true = fair policy, order of lock acquisition is preserved
    private static final Lock r = rwl.readLock();
    private static final Lock w = rwl.writeLock();

    public static Templates get(String key) {
        r.lock(); // here it only waits if another thread is writing to the cache
        try {
            return templatesCache.get(key);

        } finally {
            r.unlock();
        }
    }

    public static Templates put(String key, Templates value) {
        w.lock(); // this thread waits until all preceding read threads have finished
        try {
            return templatesCache.put(key, value);

        } finally {
            w.unlock();
        }
    }

    public static void clear() {
        w.lock();
        try {
            templatesCache.clear();
        } finally {
            w.unlock();
        }
    }

    /**
     * @return number of entries
     */
    public static int size() {
        int result = -1;
        r.lock();
        try {
            result = templatesCache.size();
        } finally {
            r.unlock();
        }
        return result;
    }

    public static Set<String> getKeys() {
        Set<String> result = null;
        r.lock();
        try {
            result = templatesCache.keySet();
        } finally {
            r.unlock();
        }
        return result;
    }

}

Final remarks

If you use the cached templates as described above, you will get the following behaviour:

  • Today you can quickly modify an XSL and you instantly see the changes on the page
  • With caching enabled, this live-editing does not work anymore. But: I implemented an admin section in my application where I can clear all caches. This way I can still make hot-changes in an XSL and make the changes live immediately, but only upon explicit request.
  • Performance gets improved in any case. If you use MANY includes, it will help to improve a lot. In my case, the XSL rendering part got 10 times quicker.

Reply to this email directly or view it on GitHub: https://github.com/wyona/yanel/issues/67#issuecomment-40081762

baszero commented 8 years ago

I see that this has still not found its way into Yanel... this is really a pity! This template cache is really boosting a Yanel App very much, if you have really large XSL files.

I will send a pull request, hoping that it will find its way into yanel.

For the moment I will just overwrite the BasicXMLResource.java in my realm.

baszero commented 8 years ago

Please close this issue and just process this pull request: https://github.com/wyona/yanel/pull/77