Open drdhaval2785 opened 3 years ago
9716 matches in 9352 lines for "{#[^#]*[.]" in buffer: bor.txt
Don't know why number differs from (u'.', 7697)
.
In slp1, the period character '.' represents danda. (and two periods '..' represents double-danda, which in Unicode Devanagari is a separate code point).
in headword 'a' in bor, the first period in {#X##} is at line 21:
<div n="lb"/>{#aSoko vfkzaviSezaH.#} </div><div n="I">IV When used distribu-
A devanagari display shows the period is transformed to danda:
But in the scan, the period is just a period (English punctuation):
In this case, the period should be moved outside of the {#X#}
<div n="lb"/>{#aSoko vfkzaviSezaH#}. </div><div n="I">IV When used distribu-
If there is Devanagari text in BOR which really does have a danda, then the corresponding period character in bor.txt should be retained.
So the general answer has to be No, don't move all periods outside of {#X#} in bor.
However, AFAIK, dandas generally appear in Sanskrit verses. And, in BOR, the Sanskrit text seems to be 'short' (This generalization for bor.txt based on random browsing of the 9000+ instances of Sanskrit text with periods).
Thus, for bor, I suspect it is safe to globally move the periods.
9559 matches in 9200 lines for "[.]#}" in buffer: bor.txt
Almost all the periods in bor Sanskrit text occur just before the ending markup .
And these can easily be changed to #}.
.
Changing the rest would be a bit trickier, but likely doable by a regex replacement.
The apostrophe also has significance in slp1 as avagraha. It should NOT be moved outside of {#X#}
Similarly \/^
characters for accents, but bor.txt probably doesn't have these.
Also '|' and '~' have significance in SLP1.
Certainly semi-colon and comma have no significance in slp1. Almost all of these in bor.txt sanskrit
occur at the end . So simple replacements of ;#}
to #};
and similarly for comma would be slight
improvements to the coding of Sanskrit.
Below is the statistics of various intermediate items found out by the following regex
"({#[^#]*)([^a-zA-Z0-9 ]+)([^#]*#})"