drdhaval2785 commented 8 years ago

@funderburkjim This is just to understand how the verb anubandhas were handled while deriving headwords from VCP. The question arose because of the following entry. u(o)laqa capture

Here, the verb is laqa and both u and o are anubandhas.

Whereas the program seems to be linking to ulaqa. If this is the case, then olaqa should also link here.

drdhaval2785 commented 8 years ago

Print error 0:kizkinDyA(nDyA)Dipa:kizkinDyADipa:kizkinDyAnDyA:153424:153425

should be kizkinDyA(DA)Dipa

capture

drdhaval2785 commented 8 years ago

Checked till 300. Rest some other day.

gasyoun commented 8 years ago

u and o are anubandhas

Great, was unaware there can be variations of anubandhas as well in such a place.

funderburkjim commented 8 years ago

Here, the verb is laqa and both u and o are anubandhas.

I think that 'u' and 'o' are NOT anubandha here.

There are two roots, ulaqa and laqa. Both are in VCP as well as in MW.

And there is also root olaRq in both. olaRq and ulaq are shown as having similar meanings in MW (throw out).

The conjecture would be that in VCP, the author is saying that root ulaqa may have the alternate spelling olaqa. [Probably, the ending 'a' is anubandha in both].

Madhaviya shows that one form of laq DOES have anubandha 'o', so this is an argument in favor of @drdhaval2785 's assertion that 'u' and 'o' are anubandhas.

Resolution of these competing claims too hard for me!.

drdhaval2785 commented 8 years ago

Kept in deep freeze till we find ways and means to venture into it. Can we at least scrape a list of similarly situated verbs please? If there is some markup to separate verbs?

gasyoun commented 8 years ago

If there is some markup to separate verbs?

None yet. I was scraping using word endings.

funderburkjim commented 8 years ago

Is there some markup to separate verbs?

Answer, no EXCEPT for MW, where there is such markup. And, the MWVlex repository has a good summary of the verb markup in MW

For other dictionaries, grammatical category markup has not been developed.

However, it is feasible to develop a program to identify, for each Sanskrit headword dictionary, a list of the headwords which are roots. This could be based on hand-crafted regexes, with different dictionaries typically requiring different regexes.

Then, markup can be added to the dictionaries to reflect the records chosen by the regex, so that future work would not have to know the particulars of the regex.

This markup could later be applied to other records of a dictionary that the verb-regexes missed.

For VCP, we should consult the Tirupati edition. Since that edition has already added at least some grammar markup.

gasyoun commented 8 years ago

For other dictionaries, grammatical category markup has not been developed.

I think it's even more important than cleaning out non-headwords of dictionaries. I'm ready to read the Prefaces and give notes about the dhatus. For example Grassman Wörterbuch zum Rig Veda writes in detail about and there are reviews on dictionaries, that contain a lot of interesting meta data for us.

feasible to develop a program to identify, for each Sanskrit headword dictionary, a list of the headwords which are roots

Some dreams will have to remain dreams for now.

funderburkjim commented 7 years ago

@gasyoun Since there are about 1700+ cases of alternate headwords in VCP, completion of this work that Dhaval has begun might be a good task for Radha to help with. Agree?

gasyoun commented 7 years ago

Agree, but she'll need hints.

funderburkjim commented 7 years ago

Ok Good. I'll work on that. Am beginning by reviewing what Dhaval has done.

Dhaval has written programs that generate plausible suggestions as to how to resolve the alternates.

Probably many (most) of these are right. It would be helpful to devise some filters that would give a second opinion on the correctness of the alternates. One such filter would be whether the generated alternates are present in other dictionaries. @drdhaval2785 Have you already done this?

funderburkjim commented 7 years ago

@drdhaval2785 couple of technical questions:

where does the 'ahw3' file come from (e.g. vcpahw3.txt).
There is a note 'checked through 300' for this file. So we can assume the first 300 lines of vcpahw3.txt are right?
Is the data/VCP directory the one to work with (upper case VCP)
- what is the 'data/vcp' (lower case) directory ?
- Oddly, the lower-case 'data/vcp' directory shows only when I view the repository on GitHub. When viewed in the local copy of the repository, there is no lower-case 'data/vcp'. Maybe this is due to case-insensitivity in the Windows file system?

funderburkjim commented 7 years ago

@drdhaval2785 Found answers to first two questions (meaning is vcpahw3) in the readme.

funderburkjim commented 7 years ago

Analysis of vcpahw1 against hwnorm1c.txt.

Results are in vcpahw1_hwnorm.txt.

Details of dictionary coverage are in vcpahw1_hwnorm_detail.txt.

Summary:

1060 of the alternate spellings are found in at least one other dictionary.
- This group doesn't need much further work -- we can be pretty sure these derived alternate spellings are as they should be.
In 30 cases, the alternate spelling and the non-alternate spelling are the same --- this is odd.
- These should be re-examined against the print for likely spelling error
In 637 cases, the alternate spelling is not found in any other dictionary.
- These need to be examined by other means. Probably most are right, even though the alternates have not been found.

This approach seems promising, as it reduces by 61% the number of cases needing further attention.

drdhaval2785 commented 7 years ago

Some stats and ideas for further refactoring. When this was developed hwnorm1c was not that powerful. Now a combination of my logic of force matching and hwnorm1c provide a reasonable number to examine.

NF : 0: - 34 - Maximum need for manual examination.
NF : 1: - 0 - Not possible.
NF : 2: - 0 - Not possible.
NF : 3: - 527 - This is something where I am doing some refactoring to bring it within reasonable numbers.
NF : 4: - 31 - Bracket opening to right side is preferred. Need manual examination.
NF : 5: - 4 - Can be merged automatically. a(A)rdDapAdika:ArdDapAdika, A(a)rdDakaMsika, u(U)rdra:Urdra, u(U)zmAgama:UzmAgama.
NF : 5: - 6 - Can be merged automatically. duH(du)strI:dustrI, duH(du)sPowa:dusPowa, pacatitarA(mA)m:pacatitamAm, prAtastarA(mA)m:prAtastamAm, sruc(cA):srucA. There is only one member ba(va)hupAda(d) which gives rise to four alternate forms. Two bracket items need to be segregated and handled separately.
NF : 7 - 17 - Need manual examination.
NF : 8 - 3 - Can be merged automatically.
NF : 9 - 4 - All resolutions are wrong. Need manual examination.
NF : 10 - 9 - All resolutions are wrong. Need manual examination.
NF : 11 - 3 - Can be merged automatically.

drdhaval2785 commented 7 years ago

Changes made

NF : 3: - Reduced to 130 now. Need manual examination.
NF : 4: - Reduced to 19. Need manual examination.
NF : 7: - 416. Can be auto merged.
NF : 8: - 13. Can be auto merged.

drdhaval2785 commented 7 years ago

@funderburkjim and @gasyoun I have applied some corrections and filterings. The items needing manual examination (201 entries) are as follows. Please note the last 5 entries having code 7. These need manual correction.

https://github.com/sanskrit-lexicon/alternateheadwords/blob/master/data/VCP/test.txt

drdhaval2785 commented 7 years ago

OK=OK entries also need close examination. 30 such entries.

Case 0055: OK=OK : 0:abBra(Bra):abBra:abBraBra:18104:18107
Case 0056: OK=OK : 0:abBraM(BraM)liha:abBraMliha:abBraMBraM:18108:18110
Case 0057: OK=OK : 0:abBra(Bra)ka:abBraka:abBraBra:18111:18112
Case 0058: OK=OK : 0:abBra(Bra)Nkaza:abBraNkaza:abBraBraza:18113:18114
Case 0059: OK=OK : 0:abBra(Bra)puzpa:abBrapuzpa:abBraBrapa:18115:18117
Case 0060: OK=OK : 0:abBra(Bra)mAtaNga:abBramAtaNga:abBraBraaNga:18118:18118
Case 0061: OK=OK : 0:abBra(Bra)mu:abBramu:abBraBra:18119:18120
Case 0062: OK=OK : 0:abBra(Bra)muvallaBa:abBramuvallaBa:abBraBraallaBa:18121:18122
Case 0063: OK=OK : 0:abBra(Bra)roham:abBraroham:abBraBraam:18123:18125
Case 0064: OK=OK : 0:abBro(Bro)tTa:abBrotTa:abBroBro:18126:18128
Case 0065: OK=OK : 0:abBri(Bri):abBri:abBriBri:18129:18129
Case 0066: OK=OK : 0:abBri(Bri)ya:abBriya:abBriBri:18130:18130
Case 0090: OK=OK : 7:amudryaYc(c):amudryaYc:amudryaYcc:21863:21865
Case 0091: OK=OK : 7:amumuyaYc(c):amumuyaYc:amumuyaYcc:21866:21868
Case 0115: OK=OK : 0:arbu (rbu)da:arbuda:arburbu:26112:26141
Case 0142: OK=OK : 0:AkarzA(zA)di:AkarzAdi:AkarzAzA:42035:42039
Case 0298: OK=OK : 0:kizkinDyA(nDyA)Dipa:kizkinDyADipa:kizkinDyAnDyA:153424:153425
Case 0408: OK=OK : 0:gajacirBa(rBa)wA:gajacirBawA:gajacirBarBa:186464:186465
Case 0427: OK=OK : 8:gi(r)rA:girA:grrA:193588:193597
Case 0551: OK=OK : 0:tattva(tva):tattva:tattvatva:240600:240644
Case 0814: OK=OK : 0:pattra(tra):pattra:pattratra:317285:317389
Case 0911: OK=OK : 0:pUrva(rva):pUrva:pUrvarva:331420:331428
Case 0913: OK=OK : 0:pUrva(rva)kAya:pUrvakAya:pUrvarvaa:331431:331431
Case 0914: OK=OK : 0:pUrva(rva)kAlika:pUrvakAlika:pUrvarvaika:331432:331435
Case 0915: OK=OK : 0:pUrva(rva)kft:pUrvakft:pUrvarva:331436:331438
Case 0955: OK=OK : 0:pUrva(rva)vAda:pUrvavAda:pUrvarvaa:331816:331819
Case 0980: OK=OK : 0:pUrvya(rvya):pUrvya:pUrvyarvya:331890:331891
Case 1399: OK=OK : 0:Ballu(llu)ka:Balluka:Ballullu:349621:349625
Case 1428: OK=OK : 0:maRWa(Wa):maRWa:maRWaWa:355126:355133
Case 1476: OK=OK : 0:mEttra(tra):mEttra:mEttratra:358694:358700

drdhaval2785 commented 7 years ago

@funderburkjim, Now onwards use DICTahw2.txt as base.

Only change is 'suggestions which have impossible bigram, trigrams are converted to code 0 i.e. needing manual examination.'

gasyoun commented 7 years ago

@drdhaval2785 If instead of Case 1476: OK=OK : 0:mEttra(tra):mEttra:mEttratra:358694:358700

I'll write

Case 1476: OK=OK : 0:mEttra(tra):mEttra:mEtra:358694:358700

will it do?

drdhaval2785 commented 7 years ago

On 31 Mar 2017 3:48 pm, "Mārcis Gasūns" notifications@github.com wrote:

@drdhaval2785 https://github.com/drdhaval2785 If instead of Case 1476: OK=OK : 0:mEttra(tra):mEttra:mEttratra:358694:358700

I'll write

Case 1476: OK=OK : 0:mEttra(tra):mEttra:mEtra:358694:358700

will it do?

As envisaged,

Case 1476: OK=OK : 99:mEttra(tra):mEtra:mEttratra:358694:358700

Should be correct way of doing corrections. 99 means manually matched. Next item is item under consideration. Next is suggested reading. Next all is crap...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/alternateheadwords/issues/11#issuecomment-290675524, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_HQ8YRl960FAmdi_00_GZ7-Uvevbks5rrNLzgaJpZM4KR8pf .

gasyoun commented 7 years ago

Case 0911: OK=OK : 99:pUrva(rva):pUrba:pUrva:331420:331428

purba

All have same issue, it's b that's default and v in the brackets.

Case 0911: OK=OK : 0:pUrva(rva):pUrva:pUrvarva:331420:331428
Case 0913: OK=OK : 0:pUrva(rva)kAya:pUrvakAya:pUrvarvaa:331431:331431
Case 0914: OK=OK : 0:pUrva(rva)kAlika:pUrvakAlika:pUrvarvaika:331432:331435
Case 0915: OK=OK : 0:pUrva(rva)kft:pUrvakft:pUrvarva:331436:331438
Case 0955: OK=OK : 0:pUrva(rva)vAda:pUrvavAda:pUrvarvaa:331816:331819
Case 0980: OK=OK : 0:pUrvya(rvya):pUrvya:pUrvyarvya:331890:331891

gasyoun commented 7 years ago

http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/servepdf.php?dict=VCP&page=0273

Case 0055: OK=OK : 0:abBra(Bra):abBra:abBraBra:18104:18107
Case 0056: OK=OK : 0:abBraM(BraM)liha:abBraMliha:abBraMBraM:18108:18110
Case 0057: OK=OK : 0:abBra(Bra)ka:abBraka:abBraBra:18111:18112
Case 0058: OK=OK : 0:abBra(Bra)Nkaza:abBraNkaza:abBraBraza:18113:18114
Case 0059: OK=OK : 0:abBra(Bra)puzpa:abBrapuzpa:abBraBrapa:18115:18117
Case 0060: OK=OK : 0:abBra(Bra)mAtaNga:abBramAtaNga:abBraBraaNga:18118:18118
Case 0061: OK=OK : 0:abBra(Bra)mu:abBramu:abBraBra:18119:18120
Case 0062: OK=OK : 0:abBra(Bra)muvallaBa:abBramuvallaBa:abBraBraallaBa:18121:18122
Case 0063: OK=OK : 0:abBra(Bra)roham:abBraroham:abBraBraam:18123:18125
Case 0064: OK=OK : 0:abBro(Bro)tTa:abBrotTa:abBroBro:18126:18128
Case 0065: OK=OK : 0:abBri(Bri):abBri:abBriBri:18129:18129
Case 0066: OK=OK : 0:abBri(Bri)ya:abBriya:abBriBri:18130:18130

abhh

Can't understand it, @drdhaval2785

gasyoun commented 7 years ago

Case 1428: OK=OK : 99:maRWa(wa):maRWa:maRwa:355126:355133

man

drdhaval2785 commented 7 years ago

Marcis, you are not getting the point.

The problem is that both the letters inside and outside bracket is the same.

It can not be. Digitization should be

pUrba(rva)... whereas it has pUrva(rva)...

gasyoun commented 7 years ago

The problem is that both the letters inside and outside bracket is the same.

It's not that way.

Digitization is wrong, because print is bad.

It's pUrba(rva), there is no pUrva(rva) and can not be.

drdhaval2785 commented 7 years ago

On 31 Mar 2017 6:00 pm, "Mārcis Gasūns" notifications@github.com wrote:

The problem is that both the letters inside and outside bracket is the same.

It's not that way.

Digitization is wrong, because print is bad.

Oh Marcis. That is what is to be corrected.

It's pUrba(rva), there is no pUrva(rva) and can not be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/alternateheadwords/issues/11#issuecomment-290699262, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_Fo3HCj2UlxLf4ScLo-ejdbEz7nMks5rrPHwgaJpZM4KR8pf .

gasyoun commented 7 years ago

Oh Marcis. That is what is to be corrected.

I corrected it, but did not understood your comment. I said:

It's pUrba(rva), there is no pUrva(rva) and can not be. And added All have same issue, it's b that's default and v in the brackets.

Do you agree?

With Case 0055: OK=OK : 0:abBra(Bra):abBra:abBraBra:18104:18107 I do not get it, print seems identical for me.

drdhaval2785 commented 7 years ago

Leave it. It becomes clumsy to explain to you. Leave those 30. I will submit them on my own to Jim. Can you or Radha handle the rest of items mentioned in test.txt?

funderburkjim commented 7 years ago

The 'pUrba' problem was actually caused by a change in Nov 19, 2014:

; Nov 19, 2014 typo faultfinder pUrbakAya
331431 old <HI>{@pUrba(rva)kAya@}¦ pu0 pUrbaM (rvaM) kAyasya ekadeSita0 . kAyasya
331431 new <HI>{@pUrva(rva)kAya@}¦ pu0 pUrvaM (rvaM) kAyasya ekadeSita0 . kAyasya

Since we didn't have ability at that time to deal with alternate headwords, and since pUrba was so non-standard, we decided to change it to pUrva. The change could have been made as pUrva(rba)kAya to preserve both forms, but since it wasn't we are now discovering the issue noted above.

To fix it now, we need to make the correction to vcp.txt so both forms will be present in vcphw0

331431 old <HI>{@pUrva(rva)kAya@}¦ pu0 pUrvaM (rvaM) kAyasya ekadeSita0 . kAyasya
331431 new <HI>{@pUrva(rba)kAya@}¦ pu0 pUrvaM (rvaM) kAyasya ekadeSita0 . kAyasya

and similarly for the others. Then reload vpchw0.txt and rerun the analyses. This will fix this problem, and we will have key1 = pUrvakAya, key1alt=pUrbakAya .

funderburkjim commented 7 years ago

Case 0055: OK=OK : 0:abBra(Bra):abBra:abBraBra:18104:18107

The alternate should drop the 'b', resulting in 'aBra'

from MW:
aBra [p= 79] : n. (sometimes spelt abBra, according to the derivation ab-Bra, " water-bearer " ; 
cf. Comm. on ChUp.  ii, 15, 1) (rarely m. AV.  ix, 6, 47 and TS. ) cloud, thunder-cloud, rainy weather RV.  &c [L=13688]

No doubt this should apply to all the compounds also.

gasyoun commented 7 years ago

The 'pUrba' problem was actually caused by a change in Nov 19, 2014:

Oh, great.

I have so many questions because if I do not introduce all the steps to Radha, we will always be left alone. In

Case 0031: OK,NF : 0:aDaH(DaSSa)SayyA:aDaHDaSSa:DaSSaSayyA:6455:6457

none is correct as per me. It should have been

Case 0031: OK,NF : 0:aDaH(DaSSa)SayyA:aDaHSayyA:aDaSSaSayyA:6455:6457

If I change 7 to 99 metadata is lost, but so be it:

Case 1400: OK,NF : 99:BavAdfkza(Sa)S:BavAdfkSaS:BavAdfkSaS:349670:349672

Right?

funderburkjim commented 7 years ago

@drdhaval2785 As I understand it, you reduced the cases needing special (manual) examination to two categories:

Those 200+ appearing in test.txt, which are for Radha
The 30 ok=ok, which you and I will handle.
The correct derived file to use for all the others is vcpahw2.txt.

Agree?

funderburkjim commented 7 years ago

There are currently about 420+ cases that (a) are marked as OK,NF and (b) are not in test.txt

So, presumably Dhaval asserts that these need no further examination -- the alternates are confirmed.

@drdhaval2785 Can you classify these 400 or so cases so that we may understand the reason for your confidence in the alternates? Maybe put this classification into a file named something like nf_confirmed1.txt

funderburkjim commented 7 years ago

@gasyoun You need to tell me what we need to provide to Radha to facilitate her examination of the cases in test.txt.

gasyoun commented 7 years ago

Case 0035: OK,NF : 99:anAloci(qi)ta:anAlocita:anAloqita:9514:9516
Case 0044: OK,NF : 99:anta(nte)sTA:antesTA:antasTA:12948:12954
Case 0050: OK,NF : 99:apigf(grA)hya:apgrAhya:apigfhya:16059:16059
Case 0067: OK,NF : 99:aBigUrtta(rRRa):aBigUrtta:aBigUrRRa:19012:19015
Case 0068: OK,NF : 99:aBidi(Di)psu:aBiDipsu:aBidipsu:19287:19289
Case 0073: OK,NF : 99:aBimarSana(rzaR):aBimarSana:aBimarzaR:19841:19848
Case 0096: OK,NF : 99:amBaH(mBassA)sAra:amBassA:amBaHsAra:22639:22639
Case 0097: OK,NF : 99:amBaH(mBassU)sU:amBaHsU:amBassU:22640:22640
Case 0104: OK,NF : 99:araRya(Rye)cara:araRyacara:araRyecara:23997:23998

@funderburkjim she does not know a thing. She has never worked with github or SLP1. I can tell if myself get all the details and make a video. That's why it's crucial that my cases get approved, so I'm sure I do it as it should have been done.

funderburkjim commented 7 years ago

@gasyoun I think the cases of test.txt are what Radha should work with, but that the format of test.txt is not useful.

Let's think of a UI. I'll open a new issue just for us to work on what this UI should be. Let's discuss it there (#17)

gasyoun commented 7 years ago

Let's think of a UI.

Hope previous code can be reused. And a version for PWG and PWK sopasarga dhatus as well, later.

funderburkjim commented 7 years ago

Agree that it's worthwhile spending some time developing the UI.

funderburkjim commented 7 years ago

comments on vcpahw3

added line1,line2 fields to several records where they were missing. This makes the format of vcpahw3 the same as that of vcpahw2.
Dhaval noticed a digitization error, and made a correction to the key2 for one case.
- Here is the correction to vcp.txt
```
24058 old <HI>{@araRya(RyAnAm)@}¦ pati pu0 araRyAnAM tatrasTAnAM cOrARAM patiH
24058 new <HI>{@araRya(RyAnAm)pati@}¦ pu0 araRyAnAM tatrasTAnAM cOrARAM patiH
```
- Here is correction to vcpahw2:
  - vcpahw2: 0:araRya(RyAnAm):RyAnAm:araRyaRyAnAm:24058:24069
  - vcpahw3: 99:araRya(RyAnAm)pati:araRyAnAmpati:araRyaRyAnAm:24058:24069

funderburkjim commented 7 years ago

vcpahw4

The 'pUrva' changes need to be introduced into the final alternate headwords list. I thought it would be clean to apply the update principles that we use for the dictionary digitizations to this situation.

The 'old' file is vcpahw3.txt. The 'new' file is vcpahw4.txt' The 'change' file is 'manualByLine.txt'.

The script 'update.sh' does the transorm from vcpahw3 to vcpahw4, based on manualByLine.

For now at least, I'm considering vcpahw4 as the file which will have the finalized alternate headword information. It should not be altered manually, but altered only by adding changes to manualByLine and rerunning update.sh.

choice of code classifier.

The first correction from manualByLine is:

911 old 0:pUrva(rva):pUrva:pUrvarva:331420:331428
911 new 20:pUrva(rba):pUrba:pUrvarva:331420:331428

This is a change to line 911. In addition to the two 'rv -> rb' changes, the first field was changed from '0' to '20'. This first field is a classification code which Dhaval devised based on his analysis. The current codes appearing in vcpahw3 are : 0-12 and 99.

I chose a new code '20', which will be used just for the 6 pUrva changes. As long as Dhaval doesn't also use code '20' in vcpahw3, we'll be fine to use this new '20' code in vcpahw4.

Just as a reminder, there are corrections pending to vcp.txt corresponding to these vcpahw3/4 changes.

funderburkjim commented 7 years ago

abBr -> aBr

Dhaval has already made these changes in vcpahw3.
I added corrections in manualByLine, to change the code from 99 to 21 (so code 21 will be reserved just for these) in vcpahw4.

funderburkjim commented 7 years ago

I hadn't realized it, but Dhaval apparently has made corrections in vcpahw3 for all the ok=ok cases.

I'm going to work through the rest of these, and assign various new codes (instead of 99), and mention where there need to be corresponding corrections to the digitization.

First two:

0090 old 99:amudryaYc(c):amudryac:amudryaYcc:21863:21865
0090 new 22:amudryaYc(c):amudryac:amudryaYcc:21863:21865
0091 old 99:amumuyaYc(c):amumuyac:amumuyaYcc:21866:21868
0091 new 22:amumuyaYc(c):amumuyac:amumuyaYcc:21866:21868

There are only 10 more cases.

funderburkjim commented 7 years ago

rest of ok=ok cases.

Here are changes made via manualByLine , and, where needed, to vcp.txt `### Rest of ok=ok

; code=23.
115 old 99:arbu (rvu)da:arvuda:arburbu:26112:26141
115 new 23:arbu(rvu)da:arvuda:arburbu:26112:26141
Requires change to vcp.txt also
26112 old <HI>{@arbu (rbu)da@}¦ na0 arba (rva) vic tasmE udeti ud + iR--qa .
26112 new <HI>{@arbu(rvu)da@}¦ na0 arba (rva) vic tasmE udeti ud + iR--qa .

; code=24
0142 old 99:AkarzA(zA)di:AkazAdi:AkarzAzA:42035:42039
0142 new 24:AkarzA(zA)di:AkazAdi:AkarzAzA:42035:42039

; code=25
0298 old 99:kizkinDyA(nDA)Dipa:kizkinDADipa:kizkinDyAnDyA:153424:153425
0298 new 25:kizkinDyA(nDA)Dipa:kizkinDADipa:kizkinDyAnDyA:153424:153425
change to vcp.txt
153424 old <HI>{@kizkinDyA(nDyA)Dipa@}¦ pu0 6 ta0 . vAlinAmake vAnararAje
153424 new <HI>{@kizkinDyA(nDA)Dipa@}¦ pu0 6 ta0 . vAlinAmake vAnararAje

; code=26  Dhaval - please check.
0408 old 0:gajacirBa(rBa)wA:gajacirBawA:gajacirBarBa:186464:186465
0408 new 26:gajacirBa(rBi)wA:gajacirBiwA:gajacirBarBa:186464:186465
Requires (type=p) change to vcp.txt:
186464 old <HI>{@gajacirBa(rBa)wA@}¦ strI gajapriyA cirBa(rBi)wA SA0 ta0 . indra-
186464 new <HI>{@gajacirBa(rBi)wA@}¦ strI gajapriyA cirBa(rBi)wA SA0 ta0 . indra-

; code=27.  Print has a virama with the parenthetical (r).
0427 old 4:gi(r)rA:girA:grrA:193588:193597
0427 new 27:gi(r)rA:gir:grrA:193588:193597

; code=28  tattva or tatva, cf skd
0551 old 0:tattva(tva):tattva:tattvatva:240600:240644
0551 new 28:tattva(tva):tatva:tattvatva:240600:240644

; code=28  pattra or patra, cf skd, mw
0814 old 0:pattra(tra):pattra:pattratra:317285:317389
0814 new 28:pattra(tra):patra:pattratra:317285:317389

; code=28  tt/t alternates seems like previous 2 cases,
;  Many dictionaries have mEtra, VCP only one with mEttra.
1476 old 0:mEttra(tra):mEttra:mEttratra:358694:358700
1476 new 31:mEttra(tra):mEtra:mEttratra:358694:358700

; code=29
1399 old 0:Ballu(llu)ka:Balluka:Ballullu:349621:349625
1399 new 29:Ballu(llU)ka:BallUka:Ballullu:349621:349625
also, correction to vcp.txt
349621 old <HI>{@Ballu(llu)ka@}¦ pu0 strI Balla--u(U) ka . (BAlUka) 1 jantuBede
349621 new <HI>{@Ballu(llU)ka@}¦ pu0 strI Balla--u(U) ka . (BAlUka) 1 jantuBede

; code=30 ?  Could not find good confirmation of maWa as alternate.
1428 old 0:maRWa(Wa):maRWa:maRWaWa:355126:355133
1428 new 30:maRWa(Wa):maWa:maRWaWa:355126:355133

That should take care of all the ok=ok cases.

drdhaval2785 commented 7 years ago

gajacirBawA gajacirBiwA suggestion is OK.

funderburkjim commented 7 years ago

UrdDva confusions

In beginning to work towards a UI to investingate the cases in 'test.txt', I came upon some odditities with regard to UrdDa/Dva.
Here are the cases, acc. to vcpahw4.txt:

99:UrdDa(rdDva):UrdDva:UrdDardDva:102792:102800
99:UrdDa(dDva)ka:UrdDvaka:UrdDadDva:102801:102806
99:UrdDa(rdDva)kaca:UrdDvakaca:UrdDardDva:102807:102808
99:UrdDa(rdDva)kaRWa:UrdDvakaRWa:UrdDardDva:102809:102810
99:UrdDa(rdDva)karmman:UrdDvakarmman:UrdDardDvaan:102811:102814
99:UrdDa(rdDva)manTin:UrdDvamanTin:UrdDardDvan:103057:103061
99:UrdDa(rdDva)mAna:UrdDvamAna:UrdDardDva:103062:103075

Part of the confusion is that the 'key1' spellings currently use UrdDva.

I'm not sure about how to handle these at the moment.

For the purpose of the test.txt UI, I'm removing these from consideration.

sanskrit-lexicon / alternateheadwords

VCP verb anubandha handling #11

Analysis of vcpahw1 against hwnorm1c.txt.

comments on vcpahw3

vcpahw4

choice of code classifier.

abBr -> aBr

rest of ok=ok cases.

UrdDva confusions