sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

`o` vs `O` #45

Closed gasyoun closed 8 years ago

gasyoun commented 9 years ago

@funderburkjim Can we make a list of words with o vs O? If you would be able to make a video, I would be able to learn and become 5 g smarter.

To have both

sUpodanazazWIpUjA:PW f.  Titel
sUpOdanazazWIpUjA:MW f. N. of wk.

seems fishy to me.

gasyoun commented 9 years ago

@drdhaval2785 do you agree?

drdhaval2785 commented 9 years ago

I agree. But why only 'o' vs 'O'. Let us try to compare the words which are different only in terms of Capitals. e.g. 'e'-'E', 'o'-'O', 'a'-'A' etc.

gasyoun commented 9 years ago

@drdhaval2785 I agree, sure.

drdhaval2785 commented 9 years ago

Trying for it.

drdhaval2785 commented 9 years ago

Currently developing a comparision text based on sanhw1.txt. Not cleaned one. The same old unclean one.

Logic for filtering the words are as below:

  1. The case insensitive comparision should be 100% (strcasecmp($a1,$a2)===0
  2. both words should not be identical ($a1!==$a2)
  3. None of the dictionary should be Pune Dictionary ($dict1!=="PD" and $dict2!=="PD"). The reason behind removing this dictionary is that it has a long list of words starting with 'a'. Gives a lot of false positives.
  4. The words are not from the same dictionary ($a1,$a2 don't belong to the same dictionary)
  5. The last words are not a,A,m,M and the difference is not in the last letter only. (to accommodate for musculine/feminine and m/M ending issues).

We are taking a bunch of 10000 words from sanhw1 and comparing each against 9999. The reason behind taking 10000 bunch is to minimize the time, because comparing 1 word against 400000 odd words would be too computationally heavy. Therefore, 10000 seem fine for comparision. As the words confused by the 'o'/'O' etc in the middle would have the same starting pattern, there are very less chances of missing some important catch by taking 10000 batch.

drdhaval2785 commented 9 years ago

The raw text file is available at https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/o_vs_O.txt.

The data is culled from sanhw1.txt using https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/o_vs_O.php.

The format is in the following format. Word1:Word2 - Dict1:Dict2

gasyoun commented 9 years ago

Great. Same in devanagari for @Shalu411 , please? She hardly recognizes Sanskrit in SLP1 form. Description and logic very well-thought. :v:

drdhaval2785 commented 9 years ago

@gasyoun and @Shalu411 Ladies and gentlemen, presenting before you, the list of similar looking words from sanhw1.txt.

http://drdhaval2785.github.io/o_vs_O.html

Cursory reading shows that there are mistakes highlighted by this approach also e.g.

अंशांशि:अंशांसि - GST,SHS,WIL:MW

aMSAMsi -> aMSAMSi in MW (print error) capture

drdhaval2785 commented 9 years ago
अंशुमन्त्:अंसुमन्त् - CAE,PW,PWG,STC:CCS

aMsumant -> aMSumant in CCS capture

drdhaval2785 commented 9 years ago

Before we proceed, I invite @funderburkjim to better the code by his magic refractorings, so that github displays and installations are just copy paste and not manual one.

My suggestion would be:

  1. Open github.io page.
  2. Provide something like fuzzy-alpha kind of stuff, from where Jim's python can manage the corrections.
  3. The correction sentence may be like
ABC:abc in DICTNAME

(ABC has to be changed to abc in DICTNAME dictionary). Second file can be for copy pasting in the github automatically, something like

001 AcAryO -> 
headword <a target='_INMword' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/indexcaller.php?input=slp1&output=deva&key=AcAryO'>AcAryO</a> ---  page <a target='_INMpage' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/servepdf.php?page=004'>004-1</a>
<hr/>
Shalu411 commented 9 years ago

Namaste A very good effort. Seems great at first look. Please mention issues, problematic words, if any, until now- so that the first-timers get a chance to peep into deeper, to get others. Then, can make the doc. better.

gasyoun commented 9 years ago

@Shalu411 We are all first-timers at this battle, my dear lady. Some of us will never get out of it, alive.

@sanskritisampada Would love to hear your opinion if any.

@drdhaval2785 Sorting by varnamala is not the most convient way. If we should sort by dictionary - maybe. Longest words first - absolutely. I guess bolding as before the difference at least in non-Devanagari would help, otherwise valuable time lost deciphering. Let's add numbering before words, too many are there on a single page.

If words on the left and the right side are in the same dictionary (as in MW in this case), I would mark them as less possible variants mewI:meWi - MW:MW,MW72,PW,PWG

Long words and each only in 1 source - best candidates for correction. meGapAlitftIyAvrata:meGapAlItftIyAvrata - MW:PW

Long words even with several candidates is a gift वर्षाम्भःपारणव्रत:वर्षाम्भःपारणाव्रत - varzAmBaHpAraRavrata:varzAmBaHpAraRAvrata - MW,PW,PWG:SHS,WIL,YAT

Big list (3+) on both sides should get most attention, so some graphical markup (red font?): meGa:mEGa - BEN,BHS,BOP,BUR,CAE,CCS,GRA,MD,MW,MW72,PW,PWG,SCH,SHS,STC,VCP,WIL,YAT:CAE,CCS,MW,MW72,PW,PWG

But not always big means a victory, in many cases SCH should bring as something new, still valid निक्ष्:नीक्ष् - nikz:nIkz - AP,AP90,BEN,BOP,CAE,CCS,GRA,MD,MW,MW72,PW,PWG,YAT:SCH

As per now अखट्टि:अखट्ठि - aKawwi:aKawWi - BUR,GST,MW,MW72,PW,PWG,VCP,YAT:WIL it's rather hard to visually find : I would replace : with _:_, space before and after bolded

The old anusvara battle - guess to make the grey is what they deserve. Maybe 50%+ would go away, if we hide them for now नालंबी:नालम्बी - nAlaMbI:nAlambI - AP90:AP,VCP

drdhaval2785 commented 9 years ago

OK. Time to do further documentation. The download of .txt file was very painfully slow. So, I have not done corrections / refractorings on the .txt file.

While displaying to the .html file, I have suppressed the following to reduce the known issues.

  1. Unadjusted data 7148 entries
  2. Adjusted for nasals 5392 entries
  3. Words ending with 'Ant' and 'AMs' adjusted 5351 entries
  4. Removed BHS only entries 4733 entries. (Removed this because it is more leaning towards pAli / prAkRta).
drdhaval2785 commented 9 years ago

@gasyoun

Sorting by varnamala is not the most convient way. If we should sort by dictionary - maybe. Longest words first - absolutely.

Too difficult, because there are multiple dictionaries around. Comparing them with another dictionary is needed. I would be glad to receive some help in this regard.

I guess bolding as before the difference at least in non-Devanagari would help, otherwise valuable time lost deciphering.

The issue here is that it is not based on pattern finding method used earlier. Separate PHP function used. So, we don't have luxury to bold it. The second issue is we are not sure, which of the two words is correct. Bolding will divert the attention unduly

Let's add numbering before words, too many are there on a single page.

Done.

If words on the left and the right side are in the same dictionary (as in MW in this case), I would mark them as less possible variants

This has been taken care of in the display of HTML. The code responsible is count($val)===count(array_unique($val)). This sees if there are more than one entries of the same dictionary i.e. on both sides of colon.

drdhaval2785 commented 9 years ago
Long words and each only in 1 source - best candidates for correction.
meGapAlitftIyAvrata:meGapAlItftIyAvrata - MW:PW

I agree

Long words even with several candidates is a gift
वर्षाम्भःपारणव्रत:वर्षाम्भःपारणाव्रत - varzAmBaHpAraRavrata:varzAmBaHpAraRAvrata - MW,PW,PWG:SHS,WIL,YAT

Not sure about their usefulness. But usually long words are more prone to typing errors by the scribes and less liable to be caught by naked eyes.

Big list (3+) on both sides should get most attention, so some graphical markup (red font?):
meGa:mEGa - BEN,BHS,BOP,BUR,CAE,CCS,GRA,MD,MW,MW72,PW,PWG,SCH,SHS,STC,VCP,WIL,YAT:CAE,CCS,MW,MW72,PW,PWG

I guess such words would be most probably false positives. So, no need to bother much about it. Very less chances that so many dictionaries have typed it wrong.

But not always big means a victory, in many cases SCH should bring as something new, still valid
निक्ष्:नीक्ष् - nikz:nIkz - AP,AP90,BEN,BOP,CAE,CCS,GRA,MD,MW,MW72,PW,PWG,YAT:SCH

The worst criminal dictionary was BHS, so removed it to reduce False positives. SCH has highlighted many wrong entries. So keeping it as it is.

As per now
अखट्टि:अखट्ठि - aKawwi:aKawWi - BUR,GST,MW,MW72,PW,PWG,VCP,YAT:WIL
it's rather hard to visually find : I would replace : with _:_, space before and after bolded

Done.

The old anusvara battle - guess to make the grey is what they deserve. Maybe 50%+ would go away, if we hide them for now
नालंबी:नालम्बी - nAlaMbI:nAlambI - AP90:AP,VCP

Done.

drdhaval2785 commented 9 years ago

The statistics of refractorings are as below:

// 7148 entries with no adjustments for nasals // 5392 entries with adjustments for nasals // 5351 entries with adjustments for 'Ant' and 'AMs' // 4733 entries after removing BHS only entries.

gasyoun commented 9 years ago

Very less chances that so many dictionaries have typed it wrong. - if they all belong to same school no wonder at all. 3932 विषमचतुरश्र : विषमचतुरस्र - vizamacaturaSra : vizamacaturasra - MW,PW,PWG : SHS,WIL,YAT is a good illustration. PWG change the game, MW based on PWG. WIL was the source of the error, YAT and SHS copypasted, but it was all wrong.

1765 झुम्बरि : झुम्बरी - Jumbari : JumbarI - MW : PW both refer to same source but quote the f. word differently. 1747 ज्वलारासभकामय : ज्वालारासभकामय - jvalArAsaBakAmaya : jvAlArAsaBakAmaya - PWG : MW,PW as PW is later than PWG we should suppose Boethlingk has changed his mind and it's the "better" variant.

drdhaval2785 commented 9 years ago

OK So I have kept them as they are.

drdhaval2785 commented 9 years ago

aMSAMsi -> aMSAMSi
MW


aMsumant -> aMSumant
CCS
ajInapatrI -> ajinapatrI
SKD
aYjizwaH -> aYjizWaH
AP

Shalu411 commented 9 years ago

Namaste 4338 सजातीयविशिष्टान्तराघटितत्व : सजातीयविसिष्टान्तराघटितत्व - sajAtIyaviSizwAntarAGawitatva : sajAtIyavisizwAntarAGawitatva - PW : MW

The first is the right word - सजातीयविशिष्टान्तराघटितत्व This is wrong part विसिष्ट > विशिष्ट

gasyoun commented 9 years ago

@drdhaval2785 will there be some fine tuning of code or should we start to work with it as it is?

drdhaval2785 commented 9 years ago

There will be fine tuning of code. Wait.

gasyoun commented 9 years ago

Waiting.

drdhaval2785 commented 9 years ago

@gasyoun , @funderburkjim and @Shalu411 I have refractored the code and stored the output dictionarywise at https://github.com/drdhaval2785/SanskritSpellCheck/tree/master/o_vs_O/output. The results of o_vs_O.html have been now stored sorted dictionarywise and also in decreasing probability of being a wrong word. The documentation will take a day or two, but the output can be checked. Please ignore wrongly translated English -> Hindi in HTML files. They are not necessary for our purpose.

drdhaval2785 commented 9 years ago

Now what I would want @funderburkjim to do is

Link the words in .html files to the scan pages rather than current digitized data page.

The data without any modification is in the .txt files.

We need invariably see the scan. The present state of affairs takes an extra click after opening digitized data page to reach there. This can be circumvented by Jim.

funderburkjim commented 9 years ago

I've been busy with other things and am just looking at this work now. The lists being generated look fruitful, and a good course of work to pursue. I'll be glad to help as Dhaval suggests above. HOWEVER, I would like us (probably Dhaval and me) to first finish the few remaining dictionaries using the faultfinder approach.

When I do get to the 'link-to-page' enhancement Dhaval requests, I'll need to know exactly the steps used to generate (a) the list of comparison words and (b) the html display. Generation of the page links is much less uniform than generation of the headword display links. Maybe during this task some way can be developed to make the page link generation more uniform - this might find use in other displays.

gasyoun commented 9 years ago

@drdhaval2785 just to show how impressing it is, a few lines from /SanskritSpellCheck/o_vs_O/output/AP.html अपष्टु : अपष्ठु - apazwu : apazWu - AP : AP90,BUR,GST,MD,MW,MW72,PW,PWG,SHS,SKD,STC,WIL,YAT आगमिष्ट : आगमिष्ठ - Agamizwa : AgamizWa - AP : AP90,GRA,MW,MW72,PW,PWG कनिष्ट : कनिष्ठ - kanizwa : kanizWa - AP : AP90,BEN,BOP,BUR,CAE,CCS,GRA,MD,MW,MW72,PW,PWG,SHS,STC,VCP,WIL,YAT वाचल : वाचाल - vAcala : vAcAla - AP : AP90,CAE,CCS,MD,MW,MW72,PW,PWG,SHS,STC,WIL,YAT अर्हरिष्वाणि : अर्हरिष्वणि - arharizvARi : arharizvaRi - AP : AP90,GRA,MW,MW72,PW,PWG आयःशुलिक : आयःशूलिक - AyaHSulika : AyaHSUlika - PW : AP,AP90,MW,MW72,PWG,SCH,SHS,VCP,WIL,YAT नैष्टिक : नैष्ठिक - nEzwika : nEzWika - PW : AP,AP90,BEN,BOP,BUR,CAE,MD,MW,MW72,PWG,SHS,STC,VCP,WIL,YAT रविकुलदीपप्रकाश : रविकुलदीपप्रकास - ravikuladIpaprakASa : ravikuladIpaprakAsa - PW : MW

Wow, thanks. It's a bomb. Still I hardly understand the difference between One dictionary in first word and one dictionary in second word and One dictionary in second word and one dictionary in first word.

1) Let's add numbers to the categories, before One dictionary or letters ABCD. 2) Words with numbering 3) Sorting by alphabet has no benefit here, longer words first make more sense.

funderburkjim commented 9 years ago

I agree that the word length is a relevant factor in this discussion.

Taking word length into account might help with the computational complexity issue. I'm assuming that we are comparing words X and Y whose spelling is identical except for 1 letter. If this assumption holds, then for such an X,Y pair, the length of string X = length of Y.

So, to be efficient, when reading in the entire list, (say it is 400,000 words from sanhw1), compute and save the length of each word.

Then, sort the list of records on word length, and make subgroups based on words with the same length. Finally, when doing the more complex comparison, only X and Y within the same word-length subgroup need to be considered.

Don't know whether you need this kind of thinking, Dhaval, but thought it was interesting and should be mentioned.

drdhaval2785 commented 9 years ago
 HOWEVER, I would like
us (probably Dhaval and me) to first finish the few remaining dictionaries using the faultfinder approach. 

I agree completely. I told @gasyoun that I would take up this method after we are done with faultfinder approach. That would serve dual purpose.

  1. One methodology has been exploited to the maximum.
  2. For the next methodology, we would have sanhw1.txt which is corrected for the previous methodology.

But it was on insistence of Marcis that I spent some time yesterday to develop this methodology further. I intend to take it up for corrections only after faultfinder methodology has been exploited.

gasyoun commented 9 years ago

@drdhaval2785 I keep silent. And there is point:

  1. List of false-positives should be ready for next methodologies. Which should include the specially corrupt words. Like when Monier has a word that he knows is badly written, but references to the "right" one.
drdhaval2785 commented 9 years ago

The method is mature enough now.

I have committed it. https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/o_vs_O/o_vs_O.php and https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/o_vs_O/dictsorting.php are the programs responsible. The github.io is at http://drdhaval2785.github.io/o_vs_O/output1/AP.html. (replace the dict name by abbreviation for other dicts).

A readme is also placed at https://github.com/drdhaval2785/SanskritSpellCheck/tree/master/o_vs_O.

To document the readme.txt Coder instructions

Inputs: sanhw1.txt
Dependencies: dev-slp.php, faultfinder3a-utils.php, function.php, slp-dev.php.
Step1 - From commandline run php o_vs_O.php.
Step2 - This gives two outputs. o_vs_O1.txt is the raw form. o_vs_O2.txt is form with refractoring (i.e. after removal of unnecessary words).
Step3 - Create a folder named output1 in the same directory.
Step4 - From commandline run php dictsorting.php.
Step5 - This creates two files for each dictionary - one .txt file (word1:word2-dict1:dict2 format) and one .html file (Tabular presentation with links to the Cologne dictionaries and Devanagari display also).
Step6 - Click links from these HTML files and verify the errors.

The end user instructions

Note: 
Please focus only on the corrections in the dictionary under consideration.
If you see any errors in the dictionary other than the one you are dealing with, leave it.
You will encounter it in the dictionary concerned. We will treat it there.

Typical output is two files one .txt and one .html. All the jargons of earlier times is removed. Now only Most probable, Medium probable and Least probable sorting is done.

drdhaval2785 commented 9 years ago

@funderburkjim @gasyoun suggested that the .txt file should have devanagari words only, so that @Shalu411 can work on it comfortably. Please suggest the format in which it is easier for you to feed it to your pythons. Would धव्हल->धवल format be fine for you? In that case I will provide Devanagari1->Devanagari2 in .txt file. Anyways we are treating dictionarywise, so dictionary lists don't matter in .txt file.

funderburkjim commented 9 years ago

@drdhaval2785 I am thinking that the way to get page references for all the dictionaries is by constructing a variant of sanhw1, in which the dictionary list will be replaced by a list of (dictionary,page) pairs. [Incidentally, I'm not sure how or if homonyms will be handled - probably best to ignore at first.]

Thus, I will likely want to recreate your filtered lists using such an enhanced sanhw1. So probably I would make minor adjustments to o_vs_O.php to read and write these pairs, a similar adjustment to dictsorting to read these pairs and an additional enhancement to generate page-links from the dictionary,page information .

gasyoun commented 9 years ago

@funderburkjim homonyms best be ignored, indeed. There is one concern. If we wait for end of faultfinder, we might lost 3-4 months of valuable time. @Shalu411 might want to help on day. The question is how to help her help us. If धव्हल->धवल format is ok for Jim before he plans to do the needed adjustments (which might come in Autumn 2015), we could start testing not loosing time in parallel. ईLइत conversion issues are still around, Dhaval is aware of them. SLP1 part in the reference file http://drdhaval2785.github.io/o_vs_O/output1/MW.html is mostly needed for Jim. But it helps me as well - as the capital letters stand out and as there is no bold formatting used to mark the difference, there is some practical use of SLP1 in the reference file, but indeed it is not needed for .txt, where the corrections are made. I hope Jim can take the devanagari files, convert them back again to SLP1 and everybody will work productively. The dancing :dancer: width of columns make me mad. Even if we make the width of content vs abbreviations 50% each, they will dance, if all the page is not a single table. @drdhaval2785 Maybe make it and add the headers as text inside table cells? Links in 2nd column are broken http://www.sanskrit-lexicon.uni-koeln.de/scans/PWG%3Cbr%3EScan/?/web/webtc/indexcaller.php?key=BagavadBaskara&input=slp1&output=SktDevaUnicode for example भगवद्विलासरत्नावली case. After reading If you see any errors in the dictionary other than the one you are dealing with, leave it. I was thinking that at the header and footer of each of the o_vs_O/output1/ files we should add interlinking to all the other 33 dictionary o_vs_O files. So I would not have to browse them or download locally just to switch, because already I see that starting with one takes me to issues in another which I would not want to delay or forget. Like BEN | BUR | CAE | CCS | MD | MW | MW72 | PW | PWG | SHS | SKD | STC | WIL | YAT. About sorting - http://tablesorter.com/docs/ and http://www.listjs.com/ seem to be what I miss, 2nd one has even search, that's great, because I often use search for some specific letters.

drdhaval2785 commented 9 years ago

Jim, if you can write some PHP function which can create page links from given word for given dictionary it would be better. It would be universally applicable, and much needed. You created function colone_hrefyear for links to data, and i use it freely. It is time that we create some similar function for page E.g. colone_pagelink($word,$dict)

gasyoun commented 9 years ago

@funderburkjim please share us colone_pagelink before we finish faultinder, because it might be in August.

drdhaval2785 commented 9 years ago
The dancing :dancer: width of columns make me mad.

Fixed in the commit above.

drdhaval2785 commented 9 years ago
Links in 2nd column are broken

Fixed in commit above. Typical output now looks like the screenshot shown below: capture

See http://drdhaval2785.github.io/o_vs_O/output1/AP.html for testing of numbering, fixing of table width issue and fixing of last column links.

drdhaval2785 commented 9 years ago
interlinking to all the other 33 dictionary o_vs_O files

Too much to ask. In my opinion the linking to the cologne server is better rather than local. In toto there are not more than 5000 such suspect errors found. It is only because of dictionarywise sorting and handling that the numbers seem large. It is 10 days job for a dedicated proof reader.

funderburkjim commented 9 years ago

@drdhaval2785 There is now a development version of a Cologne server program to bring up a scanned image for any dictionary (dict=DICT) and any SLP1 headword (key=KEY):

http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/servepdf.php?dict=DICT&key=KEY

You can rewrite your html output program so that it generates links using this RESTFUL interface,e.g.

<a href="AS ABOVE" target="_BLAH">LINK-TEXT</a>

I've tested this and think it works for all the dictionaries (even the English-Sanskrit ones, like AE, MWE, BOR).

I think this is what you requested from me. Let me know if otherwise.

This same program can alternatively take a page=PAGE parameter (instead of a key=KEY) parameter. In fact, the key=KEY parameter operates as follows:

This program (with the key=KEY) parameter is NOT appropriate if the dictionary has homonyms of KEY on widely separate pages. But that is unlikely to be an issue in the o-O use case.

funderburkjim commented 9 years ago

Regarding 'dancing widths'. Wouldn't this be solved by specifying a width for the table in CSS ?

<table style='width:1000px;'>

Maybe each column would also need a fixed width :

<td class='col1'>...
<td class='col2'>...
etc.
in <head>
<style>
 .col1 {width: 15%;}  
 .col2 {width: 15%;}
 .col5 {width: 50%;}
</style>

Maybe you've already solved this issue. If so, please ignore this comment.

funderburkjim commented 9 years ago

@drdhaval2785 Here is a temporary link to the code of servepdf. It is my first experiment with using classes in PHP.

gasyoun commented 9 years ago

@drdhaval2785 such a way the file has got worse. I'll hope I'll be able to propose my version based on http://www.listjs.com @funderburkjim Fixed width is not always great, but anyway it needs testing. Because of some long string on the page everything can get bad, so we need to see the extreme cases - it is because of them we get the ugly tables.

funderburkjim commented 9 years ago

@gasyoun One question: Why all the concern about 'ugly tables'. Isn't this just a tool for research? If just for research, what matters an aesthetic detail of table column width?

drdhaval2785 commented 9 years ago

In the first entries the first column of dictionary links may seem too large, but for the entries at the end sometimes there are 15 dicts. So the wide area. Not going to spend further time on aesthetics

drdhaval2785 commented 9 years ago

Here is the magic function for future references

function pdflink($dict,$word)
{
    return '<a href="http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/servepdf.php?dict='.$dict.'&key='.$word.'" target="_blank">'.$dict."</a>";
}

This gives the link based on dictionary name and word. If need be, the word instead of dict can be shown on browser. http://drdhaval2785.github.io/o_vs_O/output1/AP.html and others have now dictionary links.

gasyoun commented 9 years ago

@drdhaval2785 wow! Do you never experience the need to open the digital text copy as well? I mean PDF is great, 1 click less, but now there is no way to get in 1 click to the digital text. For some words I myself need to see the article and it's quicker (for MW and AP for sure) to have the digital text, before I start staring at the printed page with eyes.

drdhaval2785 commented 9 years ago

@gasyoun Never ever I wanted to see the digitized data. It can always have two links, one to the digitized data and the second to the scan page, but it would have too many links and too much confusion (and also twice the HTML size).

gasyoun commented 9 years ago

@Shalu411 ever needed the digitized data?

funderburkjim commented 9 years ago

One comment on the program that permits urls like:

http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/servepdf.php?dict=DICT&key=KEY

I hope to write a multi-dictionary display lookup function at some point. It might look something like:

http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/lookup.php?dict=MW&key=rAma&input=hk
&output=deva&display=BASIC

This will be much more complicated than servepdf, but should be doable.