metanorma / coradoc

Coradoc is the Core AsciiDoc Parser used by Metanorma
MIT License
1 stars 4 forks source link

Doc: High level Architecture #3

Open abunashir opened 1 year ago

abunashir commented 1 year ago

Current Architecture

  1. Metanorma uses AsciiDoc as input syntax
  2. The AsciiDoc syntax Metanorma syntax is an enriched syntax -- not supported by normal AsciiDoc parsers (including the vanilla asciidoctor parser we use)
  3. Metanorma AsciiDoc syntax in a file is stored in an .adoc file, and parsed by Metanorma
  4. Metanorma uses asciidoctor to parse the .adoc file(s). Metanorma has a "plugin" into asciidoctor to obtain a "pseudo document tree" state of the file as an Asciidoctor object.
  5. Metanorma builds an XML file to represent that pseudo document tree object
  6. Metanorma writes the XML file into "semantic XML" and "presentational XML" files. The semantic XML file contains mostly data, the presentational XML file contains mostly presentation (e.g. a NOTE is represented in semantic xml as a <note> element, but becomes a <p> in presentational XML)
  7. The .xml file is loaded into Metanorma for rendering. Instead of loading the XML file as a Metanorma::Document object, it just runs XPath queries on this XML file using Nokogiri.
  8. Metanorma converts the presentational or semantic .xml files into outputs, like HTML, PDF and Word.

The ideal flow:

  1. Metanorma loads the AsciiDoc file using the new AsciiDoc parser. This gives us an AsciiDoc parse tree object (a generic AsciiDoc tree)
  2. Metanorma "specializes" this AsciiDoc parse tree into a Metanorma::Document document node tree. For example, a generic AsciiDoc section object can be specialized as the Bibliography class object in Metanorma.
  3. Metanorma processes the Metanorma::Document such as for link resolution
  4. Metanorma calls Document.to_xml to write the XML files
  5. The Metanorma Document object is passed to the presentation rendering stage, to generate the outputs

Note:

To support different standard we will need to build an extension mechanism where we can run callbacks on encountering document nodes, so different flavours can build their documents differently. E.g. there would be a Metanorma::Document::Iso vs Metanorma::Document::Ogc

//cc: @ronaldtse

abunashir commented 1 year ago

Document Models

abunashir commented 1 year ago

Draft work involved would be to:

From: @ronaldtse

  1. Define the grammar (e.g. in BNF form)
  2. Create corresponding classes of AsciiDoc elements (e.g. section, paragraph, role, list, list item, etc.)
  3. Create a parser that coverts an AsciiDoc text into a parse tree made up of those classes
abunashir commented 1 year ago

Resources