sanskrit-lexicon / mw-dev

Development version of MW dictionary, to collaborate with Andhrabharati
1 stars 0 forks source link

MW full-review-010: <ls> orphan numbers lookout & new tags added around the numbers #20

Open Andhrabharati opened 1 year ago

Andhrabharati commented 1 year ago

In the process of checking for ls-orphan numbers, it is felt that creating new tags to 'enclose' various numbers to separate out "pure"-numbers would facilitate in quicker finding of the ls-orphans. And this has worked well and the identified orphans were appropriately filled up. No more orphans exist (hopefully!) in the mw_AB.text now.

The new tags introduced in this lookout (with some explanation) are-- <cl>...</cl> | these denote the Skt. verbal roots (dhAtus) in 10 classes, and these numbers are marked as circled numbers [➀-➉] <col>...</col> | these denote the column numbers (in the same page) in MW print, that are used in cross-referencing various entries <ln>...</ln> | these denote the line numbers at the cited reference <nt>...</nt> | these denote the Notes in the cited reference <pcol>...</pcol> | these denote the page & column numbers in MW print, that are used in cross-referencing various entries <pe>...</pe> | these denote the 3 persons [First, Second and Third] used in declension forms <pg>...</pg> | these denote the page numbers in MW print, that are used in cross-referencing various entries <sch>...</sch> | this is mostly (a) an associated tag denoting commentarial work(s) when occurring at a cited text (the ls-tag), or (b) a commentator where such a person's name is cited

Also two more changes are done wrt numbers, as below-

  1. The <hom>-numbers are marked as numerals with a fullstop [⒈-⒏], and this covered many untagged <hom>-numbers as well!!
  2. The fractional numbers are changed to superscript digits with fraction-slash [⁄ ] followed by subscript digits. I guess, this notation is more appealing now.

After the exercise, the remaining numbers are either (a) pure numbers or (b) numbers with a dot as "list" items.

Andhrabharati commented 1 year ago

One additional marking (<cse>...</cse>) also done, which has no correlating info in any other lexical work.

However on a quick look, being just in the range [2-7] they appear to denote various तत्पुरुष समास types, that are come across in VCP and SKD [२ त॰ to ७ त॰]. Or they could probably be taken as the 6 cases (विभक्ति) in the Skt. language.

They total 90 (in count) in the whole text now, and they all are inside round braces (individual or with a comma separation) between the HW and the corresp. lexical info.

Further looking at these might give some deciding clue. @drdhaval2785, would you mind taking a look at these once?

Andhrabharati commented 1 year ago

Just seen this in my working-notes file--

Note: Still 2 lines (730522 & 798130) contain a digit followed by a dot. What could they be denoting?