sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

arI|a #377

Closed drdhaval2785 closed 2 years ago

drdhaval2785 commented 2 years ago

https://www.sanskrit-lexicon.uni-koeln.de/scans/AP90Scan/2020/web/webtc2/index.php When I search for 'a' in exact mode, I get 'a' and 'arI|a'

Andhrabharati commented 2 years ago

I had reported the error long back (1st Jan 2021) [A query on MW search & display- issue 88 in MWS], and had even reminded @funderburkjim sometime later about it; no action still.

funderburkjim commented 2 years ago

Advanced Search has several problems caused by the slp1 letter '|' (representing ळ्ह). In MW, an exact search for 'a' gives not only 'a', but also all the words ending in slp1 '|a', 2 अजमीळ्ह 3 अरीळ्ह 4 आजमीळ्ह ETC.

And a substring search for '|' finds no matches. And a suffix search for '|a' finds all headwords ending in 'a' !

These problems are due to the way regular expressions are constructed in querymodel.php. First, '|' has to be considered to be a valid slp1 character. Second, since '|' has special meaning in a regular expression ('OR'), when used without the special meaning (i.e., as part of the search string), it has to be escaped with a backslash.

The above commit e853ec2 in csl-websanlexicon fixes the code.

funderburkjim commented 2 years ago

Dictionary displays were regenerated for all dictionaries with a headword whose slp1 spelling contains a '|' character.

dictionary=ap90 1 
dictionary=mw 42 
dictionary=ben 1 
dictionary=cae 4 
dictionary=ccs 1 
dictionary=lan 2
dictionary=md 1 
dictionary=mw72 7 
dictionary=pw 4 
dictionary=pwg 5 
dictionary=sch 1 
dictionary=vei 2 

This list generated by script:

for dictlo in  acc ae ap90 ben   bhs bop bor bur cae \
 ccs gra gst ieg inm  krm mci md mw mw72 \
 mwe pe pgn pui    pw pwg sch shs skd \
 snp stc vcp vei wil  yat lan armh
do
 dictup="${dictlo^^}"
 echo "dictionary=$dictlo"
 hwfile="../../${dictlo}/pywork/${dictlo}hw2.txt"
 grep '|' $hwfile | wc -l

done