sanskrit-lexicon / ACC

ACC specific issues
0 stars 0 forks source link

Request to review acc6.txt #17

Open drdhaval2785 opened 7 years ago

drdhaval2785 commented 7 years ago

I am done doing addition of features to acc.txt for now on dev server. Time to review the progress so far.

I request @funderburkjim to review the current acc.txt (equivalent to acc6.txt) and see if we are missing some information or not.

Explanation is attached below.

acc1.txt and acc2.txt

Generated from acc_orig_utf8_slp1.txt. See ../pywork/update.sh.

acc3.txt

Generated from acc2.txt by adding meta-line details. See ../pywork/correctionwork/issue-cologne-130/redo.sh.

acc4.txt

Generated from acc3.txt by adding subject and person tags. See ../pywork/correctionwork/issue-cologne-142/subject_tag_addition.py.

acc5.txt

Generated from acc4.txt by adding literary source tags (ls). See ../pywork/correctionwork/issue-cologne-148/catalogue_tag_addition.py.

acc6.txt

Generated from acc5.txt by adding internal references. See ../pywork/correctionwork/issue-acc-2/hw_int_ref.py.

acc.txt

The most feature-rich item goes here. Currently acc6.txt is equal to acc.txt. In future, there may be acc7, acc8, and so on. The highest number should be carried to acc.txt.

drdhaval2785 commented 7 years ago

If acc.txt is found OK, it may be a taken as base file in live server.

funderburkjim commented 7 years ago

I'll review in a few days. You've done a lot of markup, and I want to understand.

Here are some preliminary questions:

gasyoun commented 7 years ago

Have you adjusted the display (web/webtc/disp.php) to properly handle the extra xml markup?

Guess not yet.

drdhaval2785 commented 7 years ago

What is the objective of the markup ?

  1. Subject tag addition 1.1. <ab type="subj"> - Find out subjects such as lex, gr, vaid, kāvya etc. 1.2. <ab type="pers"> - Find out relations like son, father, disciple etc
  2. Catalogue tag addition Find out the literary sources of a particular work e.g. Stein, Kāvyamālā, Oudh etc.
  3. Internal references A dictionary entry may refer to some known headword e.g. AnandarAya entry has son of Nṛsiṃharāya which we want to capture like <ab type="pers">son</ab> of <ab type="hw" value="nfsiMharAya">Nṛsiṃharāya</ab> because it refers to the headword nfsiMhAcArya.

Have you remade acc.xml?

No

Have you adjusted the display (web/webtc/disp.php) to properly handle the extra xml markup?

No.

I wanted you to see the acc6.txt file and give your input whether it is OK to add markup like this or not. If yes, then downstream modifications can be made. If not, there is no point doing these changes and redo them again.

funderburkjim commented 7 years ago

whether it is OK to add markup like this

As a general principle, it is OK.

From the examples, I conclude that you are adding only markup. i.e., if we removed all the <ab...>,</ab>,<ls>,</ls> tags from acc6.txt, then we would get exactly acc3.txt. Right?

Adding only markup is good.

funderburkjim commented 7 years ago

acc.xml and acc.dtd needed.

We can't fully evaluate the xml markup additions of acc6.txt (= acc.txt currently), until we

@drdhaval2785 You need to do this step next.