quranacademy / digital-quran-docs

Documentation for Quran Academy data distribution project: Digital Quran - https://quranacademy.gitbook.io/digital-quran/
39 stars 8 forks source link

Infer QTML from plain text #13

Open mustafa0x opened 4 years ago

mustafa0x commented 4 years ago

A fair amount of QTML can be inferred from regular text. Working on a tool that does so is important for two reasons:

  1. Most text is currently plain — being able to infer its semantics will save a lot of manual work.
  2. An author may wish to keep his "source" in plain text, and only "apply" QTML before publication/distribution.

For things like footnotes, the author can use Markdown-style footnotes, which are then converted to QTML.

rguliev commented 4 years ago

A fair amount of QTML can be inferred from regular text.

This is true. The idea is to not to mix XML with regex. The aim of QTML is to give a better experience to developers of Quranic projects. Like HTML did for the web-developers. A typical workflow would look like this: Take a text from a source (usually the author of the translation) -> Load as a plain text -> Automatically format the text -> Validate and finish the formatting using a special editor (see below) -> Publish QTML-formatted translation to a GitHub repo.

First, we tried to avoid XML, but it turned out to be the most flexible and reliable format (among considered). Regarding the convenience, we will add helper tools:

Partially, we already have these tools, but they need some improvement. In sha Allah, we will publish them soon, so everyone could contribute.

mustafa0x commented 4 years ago

👍

You don't have a to use regex's when parsing plain text, nor do I advise doing do. It rarely works when parsing formatting marks.

You can do what Pandoc does. You can even use Pandoc itself: https://pandoc.org/filters.html