sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

xmltags in csl-orig #366

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

The digitizations in csl-orig/v02 contain various xml-type markup. This markup, along with non-xml type markup (such as {#X#} for slp1-transcoded Sanskrit used in many dictionaries) is converted to validating xml (xxx.xml) by make_xml.py (this based on make_xml.py in csl-pywork repository).

The xml-type tags in xxx.txt files arei dentified and counted. Occasionally, this is of use, such as finding rarely (or never) used markup. This is a first such summary. Other summaries may be developed as needed.

funderburkjim commented 2 years ago

all_xmltags.txt shows the tag counts by dictionary.

Remember - these are just the tags used in the digitizations xxx.txt of csl-orig. Also, tag attributes are not considered, e.g. <lang n="greek"> and <lang script="Arabic" n="Persian"> are both represented just as the <lang> tag.

Another summary including some attributes could be developed.

The xmltag directory contains work thus far.

Andhrabharati commented 2 years ago

Good to see my use of "embedded picture" unearthing some interesting points, only lingering in @funderburkjim's mind so far.

funderburkjim commented 2 years ago

revision to meta2 files

The xxx-meta2.txt files in csl-orig contain SOME documentation of the tags used in each dictionary.

For example, ben-meta2.txt for the Benfey dictionary.

It would be useful to update these meta2 files based on all_xmltags.txt -- for example, compare ben-meta2 notes versus the 'tags for ben' section in all_xmltags.

One small detail if anyone undertakes such a revision of the meta2 file(s): the all_xmltags only includes tags appearing within an entry in xxx.txt (between metaline <L>... and metaend <LEND>)

funderburkjim commented 2 years ago

recompute xmltag

This done as it was noticed that <> tags still present in all_xmltags.txt, but these had been systematically removed last year.

One usage of all_xmltags.txt is to answer questions like which dictionaries use the <etym> tag ?