rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

implement a proper style model for xml-based documents #28

Open rhdunn opened 11 years ago

rhdunn commented 11 years ago

At the moment, the rendering model for cainteoir-engine with XML-based documents is to:

  1. call a document-specific parser;
  2. map element names to xml::context::entry objects;
  3. specify the type (span, paragraph, sentence, heading, ...) and style (emphasized, strong, superscript, ...) for the xml::context::entry objects;
  4. use a recursive-decent parser to parse the XML reader events, mapping them to the xml::context::entry types.

This works for simplistic styling and rendering, but:

  1. the rendering model is limited -- it does not support fonts, indentation or more complex styling;
  2. the interpretation of the styles is dependent on the processor -- this leads to duplicate work (e.g. text layout in doc2doc and cainteoir-gtk)
  3. maintenance of the parsers becomes more complex over time;
  4. there is duplicated code in the core processing;
  5. adding/updating the document support requires compiling cainteoir-engine with the updated logic;
  6. it is not easy to support CSS for HTML, SSML and other XML documents.

In order for this to work, the content model must be able to specify the following:

  1. implicit end tags (e.g. for br and img parsing support in HTML);
  2. optional start and/or end tags (for 12.1.2.4 Optional tags support in HTML-WHATWG);
  3. RDF metadata generation (e.g. for RDF/XML and OPF metadata extraction);
  4. table of content, anchors and links (e.g. for NCX support);
  5. document formatting (e.g. HTML/CSS styles and rendering);
  6. text-to-speech formatting (e.g. SSML and CSS3 Speech styles).

The aim is to replace all the current XML-based parsers (XHTML, HTML, NCX, OPF, OCF, RDF/XML, SSML and SMIL) as well as others (DocBook, ODF, MathML, SVG, etc.). These should be selected based on the namespaceUri and rootName and stored in the /usr/shared/cainteoir-engine/formats system directory as style documents.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026793-implement-a-proper-style-model-for-xml-based-documents?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).