Infer QTML from plain text

mustafa0x commented 4 years ago

A fair amount of QTML can be inferred from regular text. Working on a tool that does so is important for two reasons:

Most text is currently plain — being able to infer its semantics will save a lot of manual work.
An author may wish to keep his "source" in plain text, and only "apply" QTML before publication/distribution.

For things like footnotes, the author can use Markdown-style footnotes, which are then converted to QTML.

rguliev commented 4 years ago

A fair amount of QTML can be inferred from regular text.

This is true. The idea is to not to mix XML with regex. The aim of QTML is to give a better experience to developers of Quranic projects. Like HTML did for the web-developers. A typical workflow would look like this: Take a text from a source (usually the author of the translation) -> Load as a plain text -> Automatically format the text -> Validate and finish the formatting using a special editor (see below) -> Publish QTML-formatted translation to a GitHub repo.

First, we tried to avoid XML, but it turned out to be the most flexible and reliable format (among considered). Regarding the convenience, we will add helper tools:

Validator. I think it will not be used directly, it rather will be a part of other tools
Editor. The idea is to give some editor where users or authors could type their text and it will be validated and compilated to QTML automatically or semi-automatically.
Parser. In a nutshell, it takes a QTML text + instructions as input and returns parsed text as the output. Instructions are simply is a mapping of elements to a parsing instruction with some default options. These tools definitely will use regexps to automate most of the actions.

Partially, we already have these tools, but they need some improvement. In sha Allah, we will publish them soon, so everyone could contribute.

mustafa0x commented 4 years ago

👍

You don't have a to use regex's when parsing plain text, nor do I advise doing do. It rarely works when parsing formatting marks.

You can do what Pandoc does. You can even use Pandoc itself: https://pandoc.org/filters.html

quranacademy / digital-quran-docs

Infer QTML from plain text #13