muttley73 / jlibs

Automatically exported from code.google.com/p/jlibs
0 stars 0 forks source link

XmlCrawler does not support xml catalogs #32

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
hi guys, XmlCrawler is pretty cool. I noticed however that it does not allow to 
resolve imports and includes using a resource or entity catalog. It does take a 
bit of effort to introduce such support but you are using SAX and Xerces so it 
is not too painful.

There are 2 changes required, 1 is to the rules where one will not only need to 
extract the "location" but also the "target namespace" attribute to be able to 
resolve a resource in a catalog.

The patch would still be alpha and will need a bit more work to support 
catalogs properly across all types but I tested a few larger xsd libraries that 
are strung together with catalogs and it all seems to work well.

Could not find a structured test in your xml project so attached a zip with a 
very simple test case that uses a catalog resolver.

Please let me know if you need further info or like to discuss the patch.

Original issue reported on code.google.com by niels...@gmail.com on 21 Apr 2013 at 2:59

Attachments:

GoogleCodeExporter commented 9 years ago
Resolver Support added.
now you can plugin xmlcatalog as follows:

{{{
import org.apache.xml.resolver.Catalog;
import org.apache.xml.resolver.CatalogManager;

class CrawlerCatalogResolver implements XMLCrawler.Resolver{
    public CrawlerCatalogResolver(){
        CatalogManager.getStaticManager().setCatalogFiles("/Users/santhosh/Downloads/XmlCrawler-catalog-test/crawl-files/testLocalImportCatalog-catalog.xml");
    }

    @Override
    public String resolve(String namespace, String base, String location){
        Catalog catalog = CatalogManager.getStaticManager().getCatalog();

        String uri = URLUtil.toURI(base).resolve(location).toString();

        try{
            String result = catalog.resolveURI(uri);
            if(result!=null)
                return result;
        }catch(Exception ignore){
            // ignore
        }

        try{
            String result = catalog.resolvePublic(namespace, uri);
            if(result!=null)
                return result;
        }catch(Exception e){
            e.printStackTrace();
        }

        return null;
    }
}
}}}

and set above resolver in xmlcrawler as follows:
{{{
        XMLCrawler xmlCrawler = new XMLCrawler();
        xmlCrawler.setResolver(new CrawlerCatalogResolver());
}}}

Original comment by santhosh.tekuri@gmail.com on 3 May 2013 at 8:05

GoogleCodeExporter commented 9 years ago
the fix is done in r1764

Original comment by santhosh.tekuri@gmail.com on 3 May 2013 at 8:20