ncbi / DtdAnalyzer

Other
34 stars 11 forks source link

Add feature to allow you to exclude a list of elements #7

Closed Klortho closed 11 years ago

Klortho commented 12 years ago

You should be able to exclude all the "mml:" elements, for example. So either by specifying namespace prefixes, or by a fixed list, it would be nice to be able to exclude elements from the output.

Klortho commented 12 years ago

After talking with Audrey for a bit, we decided we'll try to implement this with an XSLT param, named perhaps "exclude-elems", that takes a regular expression as a value. If any element names match, they will be excluded.

So, for example, to exclude all mml: elements except mml:math (which is the way the official JATS documentation renders) you would use something like "mml:(?!math)".

Since that particular one is such a common use-case, we will also add a specific command-line parameter to the shell script that invokes this XSLT, that will pass that particular value of regular expression into the XSLT. Maybe something like "--collapse-mml".

Klortho commented 11 years ago

A problem we encountered is that XSLT 2 doesn't support lookahead assertions, like "mml:(?!math)". So we implemented two new XSLT params:

So all elements are included except if they match $exclude-elems, except if they match $exclude-except. To exclude all the mml: elements except mml:math, use this, for example,

saxon9 -s:JATS-archivearticle1.daz.xml -xsl:../xslt/dtddocumentor.xsl \
    'exclude-elems=^mml:' 'exclude-except=^mml:math'