quranacademy / DataExchangeProject

The mission of this project is to make the Noble Quran easily available everywhere by creating a central repository of the available data related to the Quran.
4 stars 0 forks source link

Format for translation meta data #2

Open naveed-ahmad opened 6 years ago

naveed-ahmad commented 6 years ago

Current format have some issues, what if we've author bio in many languages? How we can see the previous version of translations? We can add more fields, but it will break the parsers. We ( who are parsing languages to save them into database ) have to change the parsers whenever we add more fileds in meta data. What about following schema ? it will solve these problems, and backword compatiable.

{
    name: "Translation name in native language",
    translatedNames: {
        en:   'name in en',  
       ur: 'name in ur' .
    },
    language: { name: "english", iso1: 'en', iso3: 'eng' },
    author: { 
       name: 'name in native language', 
       email: 'email', 
       url: 'url', 
       translatedNames: {
        en: 'name in english', 
       ur: 'name in urdu'
       },
       bio: {
         en: 'bio in english',
         ur: 'bio is urdu'
       }
     },
     provider: {
       name: 'quranacademy', 
       email: 'email', 
       url: 'url'
     },
     versions: [
     {
       version: '1.0', 
       reviewer: {name: 'name', email: 'email'},
       approver: {name: 'name', email: 'email'},
       src: 'url to this version',
       publishedAt:  '2018-05-09'
     },
      {
       version: '0.9', 
       reviewer: {name: 'name', email: 'email'},
       approver: {name: 'name', email: 'email'},
       src: 'url to this version'
     }
     ]
    contactEmails: ["aziznepali@gmail.com", "isanghnp@gmail.com", "themessagektm@gmail.com", "hqaweb@gmail.com"],
  currentVersion:"1.0",
   firstPublished:"2018-05-09",
   lastUpdated:"2018-05-09",
   license: "TBD",
   licenseSpecialConditions:null
}

Here is current format for meta data for your reference.

{
    "name":"क़ुरआन मजीदको अर्थको नेपाली अनुवाद",
    "nameEn":"Nepali Translation of the Meaning of The Quran Majeed",

    "language":"नेपाली",
    "languageEn":"Nepali",
    "languageISO1":"ne",
    "languageISO3":"nep",

    "author":"इस्लामी संघ नेपाल",
    "authorEn":"Islami Sangh Nepal",
    "authorEmail":",
    "authorAbout":null,
    "authorAboutEn": ".",

    "about":null,
    "aboutEn":null,

    "version":"1.0",
    "reviewer":"Azizullah Ansari",
    "reviewerEmail":"aziznepali@gmail.com",
    "approver": "Azizullah Ansari",
    "approverEmail": "aziznepali@gmail.com",

    "firstPublished":"2018-05-09",
    "lastUpdated":"2018-05-09",

    "dataPreparer":"quranacademy.org",
    "dataPreparerEmail":"hqaweb@gmail.com",
    "latestVersionSrc":"https://ne.quranacademy.org/quran",
    "latestVersionSrcAlt":"https://github.com/alquran-foundation/translations",

    "contactEmails": ["aziznepali@gmail.com", "isanghnp@gmail.com", "themessagektm@gmail.com", "hqaweb@gmail.com"],

    "license":"TBD",
    "licenseSpecialConditions":null
}
mustafa0x commented 6 years ago

More fields usually aren't an issue; parsers just have to not make too many assumptions. This is especially true if we mark fields as required or optional.

mustafa0x commented 6 years ago
author: { 
   name: 'name in native language', 
   email: 'email', 
   url: 'url', 
   translatedNames: {
    en: 'name in english', 
   ur: 'name in urdu'
   },
   bio: {
     en: 'bio in english',
     ur: 'bio is urdu'
   }
 }

👍

I like the idea of structuring the data instead of it being flat.

mustafa0x commented 6 years ago

How we can see the previous version of translations?

I don't think this is too important. Yes, it may be useful at times, but it does't seem to be worth the added complexity.

rguliev commented 6 years ago

Aassalamu alaikum wa rahmatullah wa barakatuh!

  1. Using structured format instead of flat - agreed.
  2. Previous version of translations - we suppose that by default this repo will contain only the latest version of a translation. Because if version is updated, usually it means that some mistakes have been fixed. So, we don't want spread version with mistakes. However, there are some cases when people will need previous versions. For such cases we will create an archive which will contain all versions.
  3. Following schema - Do you mean a DB schema? Of course we are planning to have a DB schema and to add an API to it, in shaa Allah. But this repo mostly was created to store raw (json,csv) files easy to download and use.
rguliev commented 6 years ago

By the way, how about combining meta data with translation in one JSON? Like one translation - one JSON file. That would be more convenient to maintain.

Al-Muhandis commented 6 years ago

Good idea, I initially thought that the translations will be in XML or JSON format. But no way in CSV. However, CSV also has an advantage - it is also a good option for [relatively] large flat data. But I for JSON opinion. Still more structured and standardized than CSV

rguliev commented 6 years ago

I thought that we could discuss here format of translations, since we decided to combine meta data and text in one JSON. That's why I renamed to question. But then I decided to separate them back because I would become messy. So, I'll create another issue where format of translation text will be discussed.

Talking abut meta data format. Can we agree on following format?

"meta":{
{
    "name": "क़ुरआन मजीदको अर्थको नेपाली अनुवाद",
    "nameTranslations":{
        "en": "Nepali Translation of the Meaning of The Quran Majeed"
    },
    "language": {
        "original": "नेपाली",
        "english": "Nepali",
        "iso1": "ne",
        "iso3": "nep"
    },
    "author": {
        "email": "isanghnp@gmail.com",
        "name": "इस्लामी संघ नेपाल",
        "bio": null,
        "nameTranslations": {
            "en": "Islami Sangh Nepal"
        },
        "bioTranslations":{
            "en": "Islami Sangh Nepal..."
        }
    },
    "about": null,
    "aboutTranslations":{
        "en": null
    },

    "version":"1.0",
    "firstPublished": "2018-05-09",
    "lastUpdated": "2018-05-09",

    "license": "https://creativecommons.org/licenses/by-nc-nd/4.0/",
    "licenseSpecialConditions": null,

    "reviewer": {"name":"Azizullah Ansari", "email":"aziznepali@gmail.com"},
    "approver": {"name":"Azizullah Ansari", "email": "aziznepali@gmail.com}",
    "dataPreparer":{"name":"quranacademy.org", "email":"hqaweb@gmail.com"},

    "latestVersionSrc":"https://ne.quranacademy.org/quran",
    "latestVersionSrcAlt":"https://github.com/alquran-foundation/translations",

    "contactEmails": ["aziznepali@gmail.com", "isanghnp@gmail.com", "themessagektm@gmail.com", "hqaweb@gmail.com"],
}
rguliev commented 6 years ago

First I wanted to suggest to structure values like:

    "name":{
        "original":"क़ुरआन मजीदको अर्थको नेपाली अनुवाद",
        "en":"Nepali Translation of the Meaning of The Quran Majeed"
    },

It seems more structured. But it is not convenient, taking into account that values in original language usually are most used.

Al-Muhandis commented 6 years ago

It looks more beautiful in structure, IMHO

rguliev commented 6 years ago

@Al-Muhandis do you mean the second one? I agree, it is a way more beautiful. But let's not forget about people using it. In most cases text in original language is needed. So, I thought that it's not that convenient to type .original each time.

mustafa0x commented 6 years ago

Two items which come to mind:

Al-Muhandis commented 6 years ago

@rguliev I know what you mean. And I'm now questioning what's best.

Al-Muhandis commented 6 years ago

@mustafa0x

riwayah, since some translations will likely be based on a riwayah other than Hafs (e.g. the Berber translation is likely based on Warsh or Qaloon). If this is not set then Hafs is assumed.

+ yes kira'ah/riwayah can be an optional field in the translation metadata structure

suwar, an array of the translated surah names, since each translation usually has its own translation of them.

+ and I think an array of the surah names should be placed in the translation part of JSON structure (no in the metadata structure) that is where the text of the ayah translation and of the basmallah translation is stored

mustafa0x commented 6 years ago

and I think an array of the surah names should be placed in the translation part of JSON structure (no in the metadata structure)

This makes sense, but the problem with it is that it forces the developer to do string manipulation, because he will want to display the surah title in a place other than the place where he shows the ayah.

E.g.

<h1>The Opener</h1>
<p class="ayah">بسم الله الرحمن الرحيم</p>
<p class="translation">In the name of Allah</p>
Al-Muhandis commented 6 years ago

@mustafa0x In the sense that it is inefficient to load the data of the translation text, for example, in the surahs index page?