ncbi / DtdAnalyzer

Other
34 stars 11 forks source link

Generate <modules> section #14

Closed Klortho closed 11 years ago

Klortho commented 11 years ago

In the mockup of the output documentation that's in the JATSCon paper, I included a "Modules" section, where users can get a list of all of the modules (".dtd" and ".ent" files, usually).

Then, in the mockup of the dtdanalyzer output that I did recently, I included this as a element, with <module> children. The structured annotations with "!module" should be captured and put in here. Audrey has already implemented, in her documentor stylesheet, grabbing these and putting them into the HTML.

I've been working today on #3, putting the structured annotations back in, and have got all of them done except this <modules> section. Currently, there is no Java collection to represent this.

What I'm proposing to do is to add a Modules.java collection, which ties into the DTDEventHandler's setDocumentLocator method. This is a SAX event handler that gets called whenever anything happens in the parsing, and sets the current pubid, sysid, and line number. I'd have this invoke a putModule(locator) method, that looks at those ids, and, if they have never been seen before, saves a new Module object into the collection.

Each module should have a human-readable name. Neither the public id nor the system id are good for this. For example, I'd want "file:///home/maloneyc/git/NCBITools/DtdAnalyzer/test/split-example/split-example.dtd" to have the name "split-example.dtd".

So what I'd propose is that either the name can be included in the structured annotation, like this:

<!~~ !module split-example.dtd

or that I compute relative pathnames from the main DTD module, and use that. For example, if the main DTD module has the system ID above, then another module (hypothetical example) "file:///home/maloneyc/git/NCBITools/DtdAnalyzer/test/split-example/entities/banana.ent" would get the name "entities/banana.ent". I think the URI.relativize method will work for this (see this SO post).

Klortho commented 11 years ago

I did this as described above, except there's no way to specify a module's name inside a structured annotation. The name always comes from the relativized system identifier.

Klortho commented 11 years ago

So Audrey brought up the point that modules are just external parameter entities that have system ids, so I think we'll change this implementation a bit.

Klortho commented 11 years ago

See the new mockup for the output: https://github.com/NCBITools/DtdAnalyzer/blob/annotations/test/split-example/split-mockup.daz.xml