sanskrit-lexicon / mw-dev

Development version of MW dictionary, to collaborate with Andhrabharati
1 stars 0 forks source link

xml corrections #18

Open funderburkjim opened 1 year ago

funderburkjim commented 1 year ago

In attempting reconstruction of xml file from mw_AB, encountered a handful of xml errors that need to be corrected.

funderburkjim commented 1 year ago

See change_ab_2.txt.

Request AB to include these corrections in next release of mw_AB.

Andhrabharati commented 1 year ago

@funderburkjim

are these alright?- <pg n="p.">36</pg> instead of <pg>p.">36</pg> at 169303 <pg n="p.">518</pg> instead of <pg>p.">518</pg> at 236196 <pg n="p.">509</pg> instead of <pg>p.">509</pg> at 276955 <pg n="p.">33</pg> instead of <pg>p.">33</pg> at 278088 <pg n="p.">409</pg> instead of <pg>p.">409</pg> at 282152

Andhrabharati commented 1 year ago

BTW, I've changed all the HW entries with root symbol, so as to have the dhAtu word (whether with a homonym number or not) under the symbol enclosed in s-tag and pushed the hyphen to precede the symbol in all those comp. words.

28962 old <s>antāya-√ <s>kṛ</s>, ¦ to fight obstinately, <ls>MBh.</ls> 28962 new <s>antāya</s>-√ <s>kṛ</s>, ¦ to fight obstinately, <ls>MBh.</ls>

Reason: The dhAtu when alone is with the s-tag under the symbol, but I myself had suggested earlier to remove the s-tag around the symbol under the comp. words and it got implemented in the cdsl text those days. Now I see that consistency of dhAtu marking throughout the text got affected by this.

Hope this should be alright.

Andhrabharati commented 1 year ago

Pushed the latest file with above corrections made.

funderburkjim commented 1 year ago

The coding of the prefixed root (L=28962) is fine.

The suggested RTL coding is not valid xml. One suggested change, with valid xml, is provided in change_ab_3.txt.

Note there were also 3 <LEND> changes needed; also in change_ab_3.

Andhrabharati commented 1 year ago

Pushed an update with the corrections.

funderburkjim commented 1 year ago

@Andhrabharati , a few additional changes needed. Please see change_ab_3a.txt.

Note the <sX> errors can (currently) be identified by regex search <s[^> 1ch].

Andhrabharati commented 1 year ago

Looks like these two errors (<LEND> and <s) would be continually occurring, till my corrections are done fully!!

Pushed the update now.