suttacentral / sc-data

Content for SuttaCentral, including texts both legacy and bilara, parallels, structure, and other metadata.
42 stars 39 forks source link

Prepare Hindi texts for SC #265

Open aminah-sc opened 5 years ago

aminah-sc commented 5 years ago

Description

As per discussion here (initiated in 2015):

There’s Hindi translations of most of the Pali texts, and we have been having these typed up by contractors in India, and adding them to the site. Currently we have the whole DN and about half of MN. Contractors are finishing MN and working on AN.

We’ve run into a problem with AN, which is that the Hindi text doesn’t give any information as to when one sutta ends and another begins. No sutta numbers, no titles, nothing. So we have over a thousand suttas and no way of telling which is which.

The way SuttaCentral works is based entirely on the number assigned to the sutta. Get the number right, and everything just works. So we need someone to go through the whole of AN and add the correct numbers to the start of each sutta. The numbers need to reflect, not the implied numbering of the Hindi text, but the numbers as used on SuttaCentral. These will almost always be the same as the Bhikkhu Bodhi English edition. So essentially you will need to go through the text and add AN3.1, AN3.2 and so on at the start of each sutta, comparing the Hindi with Ven Bodhi’s edition, or with the Pali text on SuttaCentral.

(also see https://github.com/suttacentral/legacy-suttacentral/issues/126)

Ven Nibbuto has been working on this project for some time and now Saurabh will join him.

Tasks

saukap commented 5 years ago

I've analyzed the pages (images) that were omitted by the contractors and it turns out some of them are present in the doc that the contractors typed, and some have been omitted. The contractors provided the entire MN text in one file so I'm going to start splitting the files into one file per sutta, aligning the numbering with SuttaCentral's system, and creating HTML pages for them.