sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

`,#}` in acc.txt #356

Open drdhaval2785 opened 7 years ago

drdhaval2785 commented 7 years ago

Ideally the comma or other markup should be outside the Sanskrit markup. See the following lines.

    Line 150662: {#nfsiMha Bawwa,#}¦ son of Siddha Bhaṭṭa:
    Line 151422: <>{#padArTatattvavivecana,#} a criticism of the Vaiśeṣika

They can be better treated like the following.

    Line 150662: {#nfsiMha Bawwa#}¦, son of Siddha Bhaṭṭa:
    Line 151422: <>{#padArTatattvavivecana#}, a criticism of the Vaiśeṣika

or

    Line 150662: {#nfsiMha Bawwa#},¦ son of Siddha Bhaṭṭa:

In case of headwords, whether to remove the comma outside broken bar or between curly brace and broken bar is something we can think and decide. Wheareas the second case is relatively uncomplicated.

What are your takes @gasyoun and @funderburkjim ?

gasyoun commented 7 years ago

Ideally the comma or other markup should be outside the Sanskrit markup.

Sure. That has other pluses as well, like references - commas usually split different references, which now are treated as one.

remove the comma outside broken

Do not see any downsides.

funderburkjim commented 7 years ago

The problem of word-hyphenation at line breaks is a similar concern.

For instance, in acc.xml:

<H1><h><key1>akzapAda</key1><key2>akzapAda</key2></h>
  <body><s>akzapAda</s>  or <s>akzacaraRa,</s> a name of Gautama, the philo- <br/>sopher, Hall p.
  20.</body><tail><L>12</L><pc>1-001,1</pc></tail></H1>

acc.txt digitization is one where text line-breaks are honored. This is good for correction. However, honoring those line breaks in acc.xml seems suboptimal. Better would be

a name of Gautama, the philosopher, 

The correction here is to remove - <br/>. There are 5000+ such cases.

Such a change would mean an information loss in acc.xml compared to acc.txt. Maybe such information loss is irrelevant.

It is also not clear whether any <br/> markup should be present in acc.xml. What difference does it make to the digital display (which depends on acc.xml) whether the original text line breaks are represented? Maybe it makes no difference.

In the current system, it is not particularly costly to try experiments such as those just mentioned for handling line-break codes: Just a change to make_xml.py, update_sync.sh to remake downstream assets, maybe a corresponding change to disp.php.

We could also make changes related to commas at the level of acc.xml (rather than at the level of acc.txt).

gasyoun commented 6 years ago

We could also make changes related to commas at the level of acc.xml (rather than at the level of acc.txt).

So which we do we go? If nothing is lost in .txt, I think it's good to make the .xml look and search in a more friendly way.

funderburkjim commented 6 years ago

Before approaching a modification of xml, we need to get the current dev version of ACC (with all the markup added by Dhaval) fully installed at Cologne -- the missing piece involves the generation of acc.xml and the corresponding changes to the webtc/disp.php program which renders the xml into html.

gasyoun commented 6 years ago

current dev version of ACC

It's a question of Jim installing it or Jim agreeing it's ready for installing it?

drdhaval2785 commented 6 years ago

It is pending my side. Will be able to resolve after a month or so

funderburkjim commented 6 years ago

@drdhaval2785 If it's ok with you, I think I'll go ahead and prepare of make_xml.py and disp.php for acc.

gasyoun commented 6 years ago

after a month

Ok, so we will wait until December.

drdhaval2785 commented 6 years ago

It is absolutely OK with me. Please go ahead and create make_xml and disp Jim.