Open drdhaval2785 opened 3 years ago
TXT file may be treated as the starting point of the whole process. Rest all formats are generated by a dirty script at https://github.com/sanskrit-kosha/kosha/blob/master/scripts/parse_data.py .
TXT file may be treated as the starting point of the whole process.
Oh, so 3rd Sanskrit-Sanskrit dictionary - great news today we have.
@gasyoun You asked me a similar question regarding lanman recently. Do you remember where this question and my answer are?
First steps: make dictionary code X (maybe X=ABD)
construct x.txt in the metaline format.
Get a set of scanned images of the dictionary -- one image (pdf) per page.
You will need the page numbers for the <pc>
field of the metaline format of x.txt.
Let me take a look at x.txt to see if it looks conformant.
Do you remember where this question and my answer are?
https://github.com/sanskrit-lexicon/hwnorm1/issues/17#issuecomment-753690269 7 places
make dictionary code X (maybe X=ABD)
Can I keep it 4 letters please? There are plenty of dicts starting with abhidhAna... / ekAkshara... / nAnArtha.... etc. Empirically I have come to a conclusion that 4 letters are necessary to identify a dictionary unambiguously.
I would keep ARMH for abhidhAnaratnamAlA of Halayudha.
construct x.txt in the metaline format.
This is a tricky portion. Sanskrit dictionaries are different in structure than western ones. I will open a couple of different issues for deciding format of meta-line for Sanskrit koshas.
Get a set of scanned images of the dictionary -- one image (pdf) per page.
Will arrange for it. Not that difficult. Just need to split a pdf into multiple pdfs.
You will need the page numbers for the
field of the metaline format of x.txt.
New page is explicitly coded. Example - ;p{0015}
to denote start of page 15.
So this should be doable.
4 letters?
Yes, sure. EG we have MW72, AP90
split a pdf into multiple pdfs
You will need to name the pdfs in some systematic way.
At a later point, you will need to create a pdffiles.txt file that links the pages as in <pc>X
to
relative file names. For instance, lan pdffiles.txt.
Do you remember where ...
Thanks - that hwnorm1 issue was the one of interest.
https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/cologne/abhidhanaratnamala.txt is the file of new dictionary in Cologne compliant format. I also have split the PDF pages into single page PDFs. I am not sure where to put the PDF scanned image pages.
Is your dictionary code ABDR? I'll assume it is X below.
Where to put images for dixtionary X:
Three places: 1) in sanskrit-lexicon-scans organization, in new repository 'x' (lower case) 2) At cologne. The location must agree with $cologne_pdfpages_urls table in dictinfo.php file, which appears in two places:
Is your dictionary code ABDR?
It is ARMH.
Uppercase to show the generation of dictionary code.
AbhidhanaRatnaMala_of_Halayudha. -> ARMH.
I went through the process as I understood. My local machine has the dictionary ARMH up and running.
I have noted the whole process for local installation of a new dictionary in https://github.com/sanskrit-lexicon/COLOGNE/blob/master/readme_new_dict_addition.md
I am not sure what additional steps would be required to make it hosted on the Cologne servers. @funderburkjim may update the instructions further.
Then, we will have a new Sanskrit-Sanskrit dictionary at Cologne.
Then, we will have a new Sanskrit-Sanskrit dictionary at Cologne.
Hurray, the 3rd one.
Yes. And further dictionaries may continue pouring in. It is only a matter of identifying headwords from verses. 25 pages per day seem an OK goal to annotate dictionaries. I could complete a 100 page dictionary annotation of Halayudhakosha in 4 days or so, doing part time.
I could complete a 100 page dictionary annotation of Halayudhakosha in 4 days or so, doing part time.
Sounds reasonable. You've done all you can for API and asked all the questions, right?
I don't see the new ARMH in csl-orig.
There are still several steps to complete to get armh completely installed.
Are you planning to completely install armh before proceeding with other dictionaries?
Good to have your documentation.
How did you handle metaline in your local implementation? Please send me a link to your local csl-orig/v02/armh.txt so I can duplicate it on my machine.
There are a few more steps to get full installation.
But let's defer these steps until we agree on metaline issues as discussed in #338 .
Here are some additional files that need to be updated (and have been so updated) for new dictionary armh:
csl-websanlexicon/v02/redo_xampp_all.sh
csl-websanlexicon/v02/redo_cologne_all.sh
csl-websanlexicon/v02/makotemplates/web/webtc/dictinfo.php
csl-apidev/dictinfo.php
csl-apidev/sample/dictnames.js
csl-apidev/simple-search/v1.1/parse_uri.php
hwnorm1/sanhw1/sanhw1.py
And various redo scripts need to be run.
To get armh on the homepage (not yet done by me):
Edit csl-homepage/index_cologne.py csl-homepage/index_xampp.py
Then to install, sh redo_xampp.sh
for local xampp installation
sh redo_cologne.sh
for Cologne installation.
csl-orig/v02/armh/armhheader.xml needs to be filled out.
I've also got an empty file for lanheader.xml that needs to be filled out.
If there are any 'Front matter' pages for armh, they need to be added via csl-doc, and the csl-doc rebuilt with sphinx.
Another step regards scans -
I pulled your images (from sanskrit-lexicon-scans/armh/ -- where you put them)
into cologne at scans/ARMHScan/2020/web/ and put them into the pdfpages directory.
As mentioned above, this location has to be added several places for the displays to know where to look.
Thanks to @Andhrabharati as a humble gift to @drdhaval2785
If there are any 'Front matter' pages for armh
Abhidhānaratnamālā by Halāyudha (c. 950 AD)
The Abhidhānaratnamālā is a vocabulary of small extent containing about 900 stanzas and is divided into five kāṇḍas or sections as follows: 1. svarkāṇḍa, 2. bhūmikāṇḍa, 3. pātālakāṇḍa, 4. sāmānyakāṇḍa, and 5. anekārthāṇḍa. The first four of these deal with synonyms while the last is devoted to homonyms and the indeclinables. The genders are indicated by giving the declensional forms. The work does not treat of the genders so strictly as the Amarakośa although in other respects it generally follows the latter, and is composed in variety of matters. Halāyudha, the author of the present lexicon, is said to have flourished by the middle of the tenth century. R.G. Bhandarkar,1 identified him with the author of the Kavirahasya, a grammatical work written in honour of king Kṛṣṇa III (c. AD 940-56) of the Rāṣtrakūṭa family.2 Halāyudha is also supposed to be the author of the three works viz. 1. Abhidhānaratnāmālā 2. Kavirahasya, and 3. Mṛtasañjīvanī, a commentary on the Chandaḥsūtras of Piṅgala. The last is said to have been written in the reign of king Muṅja Vākpati of Dhāra. 3 It must be noted here that Aufrecht 4 regards the three Halayudhas as quite distinct and separate persons; while in the India Office Catalogue 5 the authors of the Abhidhānaratnāmālā and the Kavirahasya are regarded as identical and the author of Mṛtasañjīvanī as a different person. Weber, on the other hand, places the author of the present lexicon at the end of the eleventh century. The divergent views regarding the date of Halāyudha and his works are recorded above. For want of detailed information it is not possible at this stage to come to any definite conclusion. In the light of the evidence which is available to us we have to agree with R.G. Bhandarkar and other scholars who place Halāyudha
1 Report in Search of Manuscripts for 1883-84, p. 9. 2 Keith, History of Sanskrit Literature, 133. 3 Kalpadrukośā, Introduction, xxvi. 4 Cat. Cat., i, 764b. 5II, pt. ii, p. 1840.
in the middle of the tenth century and ascribe to him the authorship of the three works mentioned above. Among his authorities Halāyudha mentions Amaradatta, Vararuci, Bhāguri, and Vopālita. 1 So far no commentaries on the Abhidhānaratnāmālā are available either in print or in manuscript form. Aufrecht,2 however, records one commentary by Ajaḍa, which is also recorded by Bühler in his Catalogue of Manuscripts from Gujarat, III (1872), p. 34. One Halāyudhaṭīkā is cited by Vallabhagaṇi in his Sārodhāra, which is itself a commentary on the Abhidhānacintāmaṇi of Hemacandra. 3 It is, however, doubtful whether the reference to the Halāyudhaṭīkā in Vallabhagani's commentary is to the commentary on the Abhidhānaratnāmālā.
Cat. Cat., i, 24; ii, 5; iii, 6; AISM (Madras), nos. 891-5. (nos. 894-5 are said to be the works on medicine. This appears to be doubtful).
Is the worldcat reference the edition that @drdhaval2785 used for scans?
There are several mentions of 'halayudha's kosha' in archive.org.
In all the other dictionaries at Cologne, Thomas started with scanned images from a particular print edition. Then he and his 'Sanskrit typists' made the digitization from the scanned images. And the 'Front matter' consists of the front matter in the particular print edition.
I suspect Dhaval's process was different for armh.
What is the original source of the Devanagari digitization?
Is this source clearly tied to a particular print edition?
The insert above (from history of Sanskrit Lexicography) would also be a good item to put in ARMH's 'front matter' in csl-doc, even though it is not exactly 'front matter'.
The insert above (from history of Sanskrit Lexicography) would also be a good item to put in ARMH's 'front matter' in csl-doc, even though it is not exactly 'front matter'.
Exactly, but all the Dhaval's Indian Sanskrit-Sanskrit dictionaries will need one.
@gasyoun Good to see that this book is put to use so quickly.
And may I mention here thay I am in possession of a vast collection on the subject matter (I would say THE single place, no where else availabe thus)!!
csl-orig/v02/armh/armhheader.xml needs to be filled out.
I know. That was not essential, so kept it blank. Will fill it soon.
Is the worldcat reference the edition that @drdhaval2785 used for scans?
No. https://archive.org/details/halayudhakoshajayasankarjoshi1957_134_L/mode/2up is the scan I used for digitization. I could not locate the worldcat reference to this edition of Halayudhakosha.
I suspect Dhaval's process was different for armh. What is the original source of the Devanagari digitization? Is this source clearly tied to a particular print edition?
Dhaval's process was not different. It was based on the print edition linked above. The digitization was done by 'Dhaval and his Sanskrit typists / volunteers'.
csl-doc
For the time being, I have the pdf scan available for the front matter, but I do not have a text file for the same. I have not digitized the prefaces yet. If that is necessary, it can be typed in. Not a very long one. Two pages only.
history of Sanskrit Lexicography
It is good addition to have over and above the regular prefaces.
Great to see ARMH working at https://www.sanskrit-lexicon.uni-koeln.de/scans/ARMHScan/2020/web/webtc2/index.php
Still to make it to the home page, but great to see that it is working.
I am in possession of a vast collection on the subject matter
I'm ready to share the burden with you ))
If that is necessary, it can be typed in. Not a very long one. Two pages only.
Sure. And an English translation of it?
https://www.sanskrit-lexicon.uni-koeln.de/scans/ARMHScan/2020/web/webtc2/index.php
kṛpīṭayonirdamunāḥ kṛṣṇavartmāśuśukṣaṇiḥ . vibhāvasurapāṃpittaṃ jātavedāstanūnapāt .. 63 ..
Do we really want to see the ..
in IAST mode?
I do not think that double periods would have any issue.
some additional files that need to be updated
May I request @funderburkjim to write it down in https://github.com/sanskrit-lexicon/COLOGNE/blob/master/readme_new_dict_addition.md so that the documentation gets updated at a single place.
I would suggest adding the preface from the original Aufrecht edition (1861) as well.
This is in line with my comment elsewhere, to have all the "related" information at one place (to the maximum extent possible).
@drdhaval2785 Probably I should help you in this exercise of "adding new dictionaries at Cologne" as well.
Seen some errors like viprasUna (in the list), visaprasUna (in the original verse)- both for bisaprasUna, in the data.
We could together see that a "good data" is made available to the public (I see no one else that could join hands in the process).
One dictionary one month (or may be every alternate month, keeping other works that we do are not affected much) would be an achievable target, with the original digitised texts in our possession (yours in public domain and mine in a "closed box" as of now).
I would be highly obliged if you can help us in the data correction. One dict a month seems a reasonable goal.
Good, it's a deal now.
Will look at your (raw) data file in your own repo, once I post MW Annexure 1st phase.
And I guess these correction works can be separately done at your repo, and you can process the files further to reach here under Cologne "framework".
In essence, yours would be the place for "data warehousing" & Cologne's would be the place for public presentation.
Yes. All corrections occur in my github repo, which you are also a member. Cologne files would be updated via scripts.
In essence, yours would be the place for "data warehousing" & Cologne's would be the place for public presentation.
Makes sense.
Dhaval and his Sanskrit typists / volunteers
Good to know about that resource.
I have not digitized the prefaces yet.
csl-doc (based on sphinx) can handle image files (e.g. for BOR).
Probably csl-doc (sphinx) can NOT handle PDF pages.
The syntax for handling images in sphinx is a bit awkward.
request documentation gets updated at a single place.
Items 12-18 added to readme_new_dict_addition as requested
@funderburkjim so to add the Russian dictionaries I should try the steps Dhaval did myself? Still one of them is a trilingual dictionary and I do not understand what the markup should actually look like.
what the markup should actually look like.
The first step is to create the 'xxx.txt' file which would go into csl-orig; this is step 2 in 'readme_new_dict_addition' link mentioned above.
The format of xxx.txt can be seen 'by example'.
Take a look at some of the digitizations in csl-orig repository. For example md.txt.
You will need to convert the current form of your dictionary into the xxx.txt form.
This will probably require some guidance from @drdhaval2785 or me.
First, choose which dictionary you want to focus on.
Please provide a link to a pdf of this dictionary.
Also, provide a link to the text file which contains the current form of your digitization. For sake of this discussion, let's call this xxx_orig.txt. Then the first task is to convert xxx_orig.txt to xxx.txt.
If you don't yet have xxx_orig.txt, then typing that xxx_orig.txt is the first step.
Since each dictionary has its own peculiarities, we will need to see a particular dictionary in order to make specific suggestions.
This issue may be treated as a documentation for the future contributor who may wish to contribute a dictionary to Cologne Sanskrit Dictionaries.
I now have a Sanskrit-Sanskrit dictionary ready with proper tagging of headwords. How can I integrate it at Cologne, @funderburkjim ?
Dictionary name - अभिधानरत्नमाला Author - हलायुध
Files
TXT - https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/orig/abhidhanaratnamala.txt BABYLON - https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/babylon/abhidhanaratnamala.babylon XML - https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/xml/abhidhanaratnamala.xml HTML - https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/html/abhidhanaratnamala.html JSON - https://github.com/sanskrit-kosha/kosha/blob/master/abhidhanaratnamala_halayudha/json/abhidhanaratnamala.json MD - https://github.com/sanskrit-kosha/kosha/tree/master/abhidhanaratnamala_halayudha/md STARDICT - https://github.com/sanskrit-kosha/kosha/tree/master/abhidhanaratnamala_halayudha/stardict
metadata about the file