Closed Andhrabharati closed 1 year ago
I am not sure if this error occurs in the other works as well.
@Andhrabharati Please provide a specific example so I can reproduce the error.
One quick example for the missing letter after the 'accent marker'--
<L>9201<pc>1-0689<k1>AreaGa<k2>Are/aGa
{#Are/aGa#}¦ ({#Are + aGa#}) <lex>adj.</lex> {%wovon Uebel fern ist%}: {#A\rea^GA a\sme Ba\drA sO^Srava\sAni^ santu#}
<ls>ṚV. 6,1,12.</ls> {#sva\stim#}
<ls n="ṚV. 6,">56,6.</ls>
<LEND>
[As already indicated earlier, I have seen that many a times the regex (as given above) results needed textual corrections in the files.]
This problem is peculiar to the PW dictionaries. Here is a little test:
slp1 = Are/agra, deva = आरे॑अग्र, deva1 = आरे꣫ग्र
slp1 = ita/Uti, deva = इत॑ऊति, deva1 = इत꣫ूति
slp1 = go/agra, deva = गो॑अग्र, deva1 = गो꣫ग्र
slp1 = go/fjIka, deva = गो॑ऋजीक, deva1 = गो꣫ृजीक
Based on the small test:
My memory is that the PWx transcoding was developed in order to display the accents (notably udAtta) in the manner of Boetlingk.
@Andhrabharati Can you find a link to the repository and issue where this PWG devanagari was discussed?
The task now is to correct slp1_deva1.xml.
is this (https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-900759930) the one you wanted, @funderburkjim ?
or this one-- (https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-895404247)?
BTW, my above post is not just about the missing letters, it is about the typo/print errors, and the timing issue as well, which I had seen in many works as listed above.
See for example,
and
[the errors are either in the metaline or the following headerline HW entry; and sometimes in the body matter as well.]
After this change, the little test looks correct for deva1
slp1 = Are/agra, deva = आरे॑अग्र, deva1 = आरे꣫अग्र
slp1 = ita/Uti, deva = इत॑ऊति, deva1 = इत꣫ऊति
slp1 = go/agra, deva = गो॑अग्र, deva1 = गो꣫अग्र
slp1 = go/fjIka, deva = गो॑ऋजीक, deva1 = गो꣫ऋजीक
The pwg, pw, and pwkvn displays are changed, including in simple-search
I will consider this part of the n-fold problem of this issue finished.
[^<][/\\\^][fxaiueo]
corrections in MW27 such lines found, and corrected (see csl-orig commit above).
2 of these 27 required no correction.
@Andhrabharati please provide a specific example (or a couple of examples) so I can reproduce the problem.
@Andhrabharati thanks for the two PWG accent Devanagari references. These were what I was looking for.
count.txt counts the instances matching the regex [^<][/\\^][fxaiueoFXAIUEO]
in the 37 dictionaries of csl-orig.
There are 0 instances in 19 of the dictionaries. A next step would be to look for errors (and non-error patterns) in the others .
The count can be reduced further, by adding the (negating)caret after the [fxaieou], which is for the superscript notation (^X^). Also add a numeral in the initial brace.
[^<0-9][/\\^][fxaiueoFXAIUEO][^\^]
BTW, I think the GRA instances are mostly print errors, not typos. They seem to have the accent mark preceding the vowel, not after the vowel (which is the regular way).
(b) taking huge time etc.
@Andhrabharati please provide a specific example (or a couple of examples) so I can reproduce the problem.
@funderburkjim Now I could not get the error, which I had noticed for the words with the earlier regex.
I did notice it for two days when I posted the issue; and the search for any other word(s) was instantaneous at CDSL and every other site was normal. So, I am sure it was not a network/connection issue at that time.
Probably, we may ignore the timing issue for now. [If it ever occurs again, it can then be looked into.]
Changes made are in the various files in the directory issues/407/changes/.
Just by chance landed at an MW entry, which led me in identifying this error!!
The occurrences of accent marks followed by vowels (in the text data) have few types of display errors like (a) missing letters or (b) taking huge time etc. --if not involving textual corrections [whether typing or printing errors]-- in the following works: CAE, CCS, GRA, INM, MCI, MW, PW, PWG, PWKVN, SCH and STC.
In the INM, MCI etc. the '/' character is not an accent mark, but still the timing issue is noticed.
Incidentally, the first search result display for these is taking "HELL lot of a time" (sorry for the bad wording!!), few tens of seconds to few minutes in some cases [apparently going into an 'eternal' loop].
The KRM, BOP etc. which have the ^X^ as the superscript notation seem to have no such error.
------------------------------------
This case-insensitive regex may be used to get the list--[^<][/\\\^][fxaiueo]
(in SLP1)[I recall notifying one similar error [involving the '|' (slp) character], which was corrected by @funderburkjim (when a reminder directly addressing to him is posted) after few months' of my posting of the issue in the initial days of my looking at MW at CDSL.]