Closed Klortho closed 12 years ago
See contextmodel/src/gov/ncbi/pmc/dtdanalyzer/ElementModelManager.java, lines 355ff, for how they used to be handled; and .../dtdanalyzer/DTDEventHandler.java, line 286 (the comment() method) for where the new code should go.
Use test/split-example.dtd for testing.
I got most of the way done with this. The output of the split-example looks very close to the split-mockup.daz.xml that I added last week. Here are some loose ends, as well as a few changes I would like to make to the output format:
[This one is covered by #16] Instead of using "!dtd" and "!module" in the module comments, let the user put anything at all that doesn't match <elem>, @attr, %pent;, or &gent;. Then, they could write the comment like this:
<!--~~ split-example.dtd
....
The "split-example.dtd" will be ignored by the comment parser, but is more human-friendly than "!dtd".
If we can guarantee that the names of the modules will be unique, then we can take the systemId and publicId out of the location information on all of the items, and just use attributes module and lineNumber. For example, change
<declaredIn systemId="file:///home...Analyzer/test/split-example/split-example.dtd"
publicId="-//NLM//external dtd dummy public id//EN"
lineNumber="42"/>
to:
<declaredIn module='split-example.dtd' lineNumber="42"/>
As far as error checking / handling go, these are some things to check for:
Update to structured comment processing:
Autolinking is done as follows (these are illustrated in the current version of split-example.dtd):
To disable any of these, just precede them with a backslash. E.g. `<split>, \@instrument, \%banana.ent;, or \&fleegle-pic;
Still to do
Wouldn't it be easier to just require the annotations be in HTML, rather than Markdown? Then it wouldn't require the use of additional processors, etc.
That's the default: "You could also use it without any processing, in which case you could write the annotations in well-formed XHTML, and they will be copied to the output."
So nobody is required to write annotations in Markdown. Markdown is a lot more readable than XHTML, though, so that's why I thought it would be a nice option. Jeff was adding a lot of wiki-like syntax to the current annotations, and rather than go down that route, I think, it would be a lot better to use something that's a de-facto standard.
This is done. "strict mode", that I described above, is now the only mode. If your comments are not well-formed, the tool will die. It now also checks each comment for validity independently, using a SAXParser, and if not well-formed, it will choke early and report the exact file and line number of the offending comment.
In going from contextmodel → datadictionary, we lost support for structured comment annotations inside the DTD. This feature needs to be put back in.