ncbi / DtdAnalyzer

Other
34 stars 11 forks source link

Put support for structured annotations back in. #3

Closed Klortho closed 12 years ago

Klortho commented 12 years ago

In going from contextmodel → datadictionary, we lost support for structured comment annotations inside the DTD. This feature needs to be put back in.

Klortho commented 12 years ago

See contextmodel/src/gov/ncbi/pmc/dtdanalyzer/ElementModelManager.java, lines 355ff, for how they used to be handled; and .../dtdanalyzer/DTDEventHandler.java, line 286 (the comment() method) for where the new code should go.

Use test/split-example.dtd for testing.

Klortho commented 12 years ago

I got most of the way done with this. The output of the split-example looks very close to the split-mockup.daz.xml that I added last week. Here are some loose ends, as well as a few changes I would like to make to the output format:

Klortho commented 12 years ago

As far as error checking / handling go, these are some things to check for:

Klortho commented 12 years ago

Update to structured comment processing:

Still to do

ahamelers commented 12 years ago

Wouldn't it be easier to just require the annotations be in HTML, rather than Markdown? Then it wouldn't require the use of additional processors, etc.

Klortho commented 12 years ago

That's the default: "You could also use it without any processing, in which case you could write the annotations in well-formed XHTML, and they will be copied to the output."

So nobody is required to write annotations in Markdown. Markdown is a lot more readable than XHTML, though, so that's why I thought it would be a nice option. Jeff was adding a lot of wiki-like syntax to the current annotations, and rather than go down that route, I think, it would be a lot better to use something that's a de-facto standard.

Klortho commented 12 years ago

This is done. "strict mode", that I described above, is now the only mode. If your comments are not well-formed, the tool will die. It now also checks each comment for validity independently, using a SAXParser, and if not well-formed, it will choke early and report the exact file and line number of the offending comment.