PWG meta-line/IAST conversion

funderburkjim commented 7 years ago

This is a placeholder for questions which arise in the course of this conversion, which will begin in a few weeks.

I'm starting this issue now to have a place for this link to a related question.

gasyoun commented 7 years ago

begin in a few weeks

PWG and PW was my most wanted.

funderburkjim commented 6 years ago

Adjust page breaks within `<ls>`.

In the digitization, Page breaks are indicated accurately, and sometimes such page breaks occur in the middle of a literary source. For instance, under hw agnIzomIya (slp1), Proper recognition of such a literary source is much more convenient if the page break is moved out. And this is what I've done as part of the meta-line conversion. Using the original markup, here's the agnizomIya example.

OLD
{¤AV.[Page01.0038]9, 6, 6.¤}
NEW
{¤AV.9, 6, 6.¤} [Page01.0038]   
FULL LS ITEM then Page break.

The set of all instances (1751 of them) are in this file:

pwg_ls_page_adj.txt

The first two numbers in the file are

case number
line-number within the pre-meta-line version of pwg.txt.

gasyoun commented 6 years ago

So 1751 times the source was broken = unrecognized. I guess we can recognize, add markup and get back or the line brakes will be left everywhere except literary sources?

funderburkjim commented 6 years ago

All the line breaks are present; they are just offset slightly so they don't occur in the middle of a literary source.

gasyoun commented 6 years ago

offset slightly

Oh, understood.

funderburkjim commented 6 years ago

LS separation

Here is an example where there are several distinct literary sources that are 'run together'. First the scan:

Current coding

FIRST 
<ls>R2V. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5. VS. 5, 7. 20, 27. AV. 6, 49, 2. 11, 1, 9.</ls>
SECOND
 <ls>AK. 1, 1, 2, 34. H. 99. MED. c2. 1. R. 1, 7, 17.</ls>

suggested revised coding

FIRST
<ls>R2V. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5.</ls> <ls>VS. 5, 7. 20, 27. </ls> <ls>AV. 6, 49, 2. 11, 1, 9.</ls>
SECOND
 <ls>AK. 1, 1, 2, 34.</ls> <ls>H. 99.</ls> <ls>MED. c2. 1.</ls> <ls>R. 1, 7, 17.</ls>

Questions:

Is the revised coding better, because it separates distinct sources?
Is it worthwhile to spend some time now to attempt to devise some programmatic way to generate the revised coding?
- This is probably non-trivial.

drdhaval2785 commented 6 years ago

Seems worthwhile.

gasyoun commented 6 years ago

Is the revised coding better, because it separates distinct sources?

You know it is. Based on it one day we can add hyperlinks.

Is it worthwhile to spend some time now to attempt to devise some programmatic way to generate the revised coding?

If it's weeks - yes. If months - no. I would divide the trivial and leave the non-trivial if a solution can't be found in a week.

funderburkjim commented 6 years ago

Wide text in pwg

The digitization uses a special coding for text which is printed in a typographically distinct form which might be described as 'wide' format (extra space between letters). There are 51000+ such instances, with 9000+ distinct instances. Here are a few samples from print: hw = a hw = a, col. 2

hw = a, col 2, near bottom

SIgnificance ?

One question is, what is the author's intent in using this special typography? Maybe there are multiple purposes. Maybe it's just a print-setting phenomenon with no semantic content.

Such a typographic feature is noted in the digitization of some other dictionaries. Notably PW(K),.

In the meta-line conversion of PW, such text was tagged as <is>X</is>. I guess I'll use the same tag here in PWG.

gasyoun commented 6 years ago

Maybe it's just a print-setting phenomenon with no semantic content.

Does not seem so. @SergeA any clue?

SergeA commented 6 years ago

One question is, what is the author's intent in using this special typography? Maybe it's just a print-setting phenomenon with no semantic content.

All these examples show transliterated words, as terms (gaṇa, avj.) or names (Viṣṇu, Śaṁkar.) given in their full form, or abbreviated, and which are neither German words, nor Sanskrit headwords, nor Sanskrit quotations. So they are printed in a peculiar way for easy separating from other text.

In this connection I want to mention the output I´ve seen in MW. There transliterated terms as "Vedas" etc. while selected Devanagari output are represented as "वेदs". That's not good at all. Transliterated terms and names must be treated separately from real Sanskrit words (stems and quotations). The term "Vedas" should be rendered in Latin letters, no matter which output is selected. A separate markup for these words in digitalization allows also to change the outdated transliteration scheme etc.

funderburkjim commented 6 years ago

the output I´ve seen in MW ...

You make an interesting observation. I've opened an MWS issue as a placeholder for responding ... don't want to divert right now from thinking about PWG.

avj.

Many of the words seem to be Sanskrit words. But what kind of abbreviation is avj. ? If not Sanskrit, maybe this is miscoded.

Reference of all instances

For reference, this file has a listing of the current instances, with frequency. temp_filter_wide.txt

In this file, there are many letter-number codings. As a later step in this conversion, I'll transcode these to modern IAST.

I started using the <is> tag back in the IAST conversion of Burnouf, when such words were shown in italic script, so the acronym is 'italicized Sanskrit'.

SergeA commented 6 years ago

But what kind of abbreviation is avj. ?

I suppose avj. can be for avjaja (avyaya) = indeclinable.

funderburkjim commented 6 years ago

Nax. question

Nax. is coded as 'wide' text 250 or so times. A random sample of these indicates that they are all part of some literary source reference related to WEBER. E.g.

It will have a chance to get transcoded to IAST by virtue of being part of a literary source.

I think this should not be coded as 'wide'.

Any objections, @SergeA ?

funderburkjim commented 6 years ago

More wide text with literary sources

Nax., and avj. mentioned above have the common feature that they coded as 'wide' text that occurs within the scope of a literary reference. So they are recognizable as having form <ls>...<is>X</is>...</ls>. A search for all such X identifies approx. 140 distinct such X, occuring 2900+ times. These X values are in temp_filter_isls.txt. Maybe given all these instances, which I find hard to understand, I should just leave the coding alone for now (This is a change of opinion from the comment I think this should not be coded as 'wide'. under Nax above.)

How to read Agni ?

Here's an instance with Agni. How should this literary source be read?

SergeA commented 6 years ago

By comparison with the corresponding RV text, the reading is: Ṛv. 1, 13, 5. Agni 5, 4, 3. 37, 1. 7, 2, 4. Ṛv. 1, 13, 5. = see ṚgVeda, mandala 1, hymn 13, verse 5 Agni 5, 4, 3. = see the same RV, verse 5.4.3, where the headword ghṛtapṛṣṭha is in relation with agni 37, 1. = see the same RV, the same 5th mandala, hymn 37, verse 1 7, 2, 4. = see the same RV, verse 7.2.4 (The word agni is related only to the the verse RV 5.4.3, and not to the next referenced 5.37.1 ; 7.2.4 etc.)

And also: dessen Rosse Ṛv. 1, 14, 6. ऊर्मि 10, 30, 8. dessen Rosse Ṛv. 1, 14, 6 = see RV 1.14.6 ... (here I'm not sure about "dessen Rosse") ऊर्मि 10, 30, 8. = see the same RV., verse 10.30.8, where the headword ghṛtapṛṣṭha is in relation with ūrmi

As I noticed, the numbers visually differ, according to the level of text divisions. The number for the big section is the highest and also black, the number for the verse is smallest. This representation makes the references more readable.

SergeA commented 6 years ago

Nax. is explained in the sources 4.018WEBER, Nax. = WEBER, Die vedischen Nachrichten von den Naxatra (Mondstationen). Berlin, 1860. 1862.WEBER, Nax. so nax. is a term Naxatra, nakṣatra, used within a source name. Perhaps there is some system in combining small caps with wide font in PWG sources. I do not know.

gasyoun commented 6 years ago

The term "Vedas" should be rendered in Latin letters, no matter which output is selected.

Indian users would disagree.

As I noticed, the numbers visually differ, according to the level of text divisions. The number for the big section is the highest and also black, the number for the verse is smallest. This representation makes the references more readable.

Yeah, in the past Jim was able to represent the levels with font sizes at a REGEX level.

Perhaps there is some system in combining small caps with wide font in PWG sources.

Small caps is reserved for sources only, right?

funderburkjim commented 6 years ago

Just a note to let others know that progress is being made in the <ls> refactoring for pwg. I hope a graceful stop point will be reached next week. Please be patient.

gasyoun commented 6 years ago

Just a note

Good to know.

funderburkjim commented 6 years ago

This round of work on pwg is primarily over. The current status is that the Basic, List, Adv. Search, and mobile1 displays are all based on the new form of the data. The data used in the list-0.2, list-0.2s displays is based on the prior form of the data.
Others should examine the current displays.

My next task will be to document what has been done, and some things that remain to be done some other time.

I'll also work to get caught up with the comments others have made while I've been on this pwg excursion.

I'll pull list-0.2(s) to use the current data when a bit of time has passed and we don't need to look at the old form for comparison.

gasyoun commented 6 years ago

I'll pull list-0.2(s) to use the current data when a bit of time has passed and we don't need to look at the old form for comparison.

So be it. Was missing you on this trip around (or rather inside) the world.

funderburkjim commented 6 years ago

Summary of changes to pwg

The main changes were to the base digitization, pwg.txt. These changes flowed through to similar changes in pwg.xml. In addition, a few differences in the base display were introduced.

pwg-meta2

Most of the changes to pwg.txt are quite technical in nature. One way to understand these changes is to compare the meta files before and after. The pwg-meta.txt file describes salient features of the digitization before recent changes; the pwg-meta2.txt file pertains to the current form of the digitization. Copies of both these files are in this gist.

Sample comparison

An intuitive understanding of the changes is given by a close reading of comparable entries in the previous and current versions of the digitization. Since it is short, the first entry is a good place to start.

PREVIOUS (pwg8.txt)

[This is just one line -in pwg8.txt- I've introduced line breaks so this comment will be easier to read]

<H1>000{a}1{a}^1¦ Interj. {#a apehi#} (die beiden Vocale fliessen nicht in einander) 
¯{¤P. 1, 1, 14, Sch.¤}; vgl. †{gan2a} {#cAdi#} und ¯{¤VOP. 2, 19.¤} 
Drückt Mitleid aus ({#anukampAyAm#}) ¯{¤MED. †{avj.} 2.¤}

CURRENT (pwg.txt) [There are several shorter lines in pwg.txt]

<L>1<pc>1-0001<k1>a<k2>a<h>1
1. {#a#}¦ Interj. {#a apehi#} (die beiden Vocale fliessen nicht in einander) 
<ls>P. 1, 1, 14,</ls> 
<ls>Sch.</ls>; vgl. <is>gaṇa</is> {#cAdi#} und 
<ls>VOP. 2, 19.</ls> Drückt Mitleid aus ({#anukampAyAm#}) 
<ls>MED. <is>avy</is>. 2.</ls>
<LEND>

PRINTED TEXT

funderburkjim commented 6 years ago

Guided tour of the comparison

Header

The old header is <H1>000{a}1{a}^1¦. In the new form, this contributes to:

The meta-line: <L>1<pc>1-0001<k1>a<k2>a<h>1. But note that this meta line also has
- L the cologne id of this entry.
- pc the identification of the printed page of the entry.
The first part of the 'body' of the entry: 1. {#a#}¦. Note that this format now corresponds closely to the beginning of the printed text.

Sanskrit text and italic text

These are identified in the same way in both forms: {#X#} and {%X%}. [No italic text in this example].

Literary source

OLD : ¯{¤P. 1, 1, 14, Sch.¤} NEW : <ls>P. 1, 1, 14,</ls> <ls>Sch.</ls>

The first difference is simply a change of notation: from ¯{¤X¤} to <ls>X</ls>. The second difference is that the Scholiast abbreviation has been separated into a separate tag in the new form. In this case, this should be considered a minor flaw of the new form, since the preceding <ls> ends with a comma; this ls-scope problem is quite thorny, and I'll discuss it more fully in a separate issue.

iast text

Words appearing in the original digitization with coding †{X} are transformed to <is>X</is> in the new form. The feature of the printed form is wide letter spacing. There are two instances in this example.

In the old form, X is coded in the AS (letter-number) system (e.g. gan2a).

Examination of the instances throughout the text led me to believe that X is always a Sanskrit word appearing in Roman alphabet with diacritics. The text author uses his own system of diacritics. With this assumption, X in the new coding is transformed to modern IAST. So, gan2a becomes gaṇa, and avj becomes avy.

The distinct occurences of these are relatively rare. I'll discuss this more fully in a separate issue.

Incidentally, note closely the position of the period in the second example: †{avj.} and <is>avy</is>. Since (as @SergeA pointed out) this is probably an abbreviation for avyaya, it might be better for the new form to have the period within the scope of the tag here : <is>avy.</is>.

funderburkjim commented 6 years ago

Divisions

The other major difference in the two forms regards coding of subdivisions within an entry. We need to look at other entries to see this.

letter divisions

The second entry shows letter divisions: OLD

<H1>000{a}1{a}^2¦ Pronominalstamm: ²a) der 1sten Person, enthalten in 
{#aha/m, AvA/m, AvA/ByAm, Ava/yos, asmA/n, asmA/Bis, asma/Byam, asma/t, asmA/kam, asmA/su#}
 und im ved. {#asme/#} . -- ²b) der 3ten Person; •f. {#A#} .   ............... etc.

NEW

<L>2<pc>1-0001<k1>a<k2>a<h>2
2. {#a#}¦ Pronominalstamm: 
<div n="2"> a) der 1sten Person, enthalten in {#aha/m, AvA/m, AvA/ByAm, Ava/yos, asmA/n, asmA/Bis,
 asma/Byam, asma/t, asmA/kam, asmA/su#} und im ved. {#asme/#} . 
<div n="2">— b) der 3ten Person; <lex>f.</lex> {#A#} .

Compare ²a) in the old form to <div n="2"> a) in the new form. Also, note the line break at the division in the new form. This helps to break up very long lines in the original digitization into much more manageable (easier to handle in corrections) in the new form.

In comparing -- ²b) to <div n="2">— b), note that

the double-hyphen -- is changed to an em-dash — and
that em-dash is within the scope of the <div> tag.

It seems to be a feature of the print that the first division of a sequence has no em-dash.

number divisions

Number divisions are similar. Compare ³4) to <div n="1"> 4)

Greek alphabet divisions

Compare ¹a) to <div n="3"> α). Note that the old form uses a system of Latin letters to represent Greek letters, while the new form uses Unicode Greek letters directly.

Number, Letter, , Greek hierarchy

I think the prevailing hierarchy principle is : Numbers > Letters > Greek. However, there are certainly exceptions. For instance, look at the entry a above, where there are two letter divisions, but no number division. Further study might provide more insight into this aspect of the author's organization.

Prefix verb forms

For verb entries, the author uses a generally consistent system of presenting prefix forms. Consider the first verb aMsay: OLD

<H1>000{aMsay}1{aMsay},¦ {#aMsa/yati#} ³1) {%theilen%}, ¯{¤KAVIKALPADR. im C2KDR.¤}; vgl. 
{#aMSay#} . -- ³2) {%schlagen, kämpfen%} ({#samAGAte#}) ¯{¤WEST. Dha10tup. §35, 64.¤}

-<P>- {#vi#} {%theilen, brechen, unschädlich machen, abwehren%}: {#SaktiM vyaMsitAM mADavena#} 
¯{¤MBH. 1, 197.¤} {#vyaMsayAmAsa taM tasya prahAram#} ¯{¤3, 11728.¤}

NEW

<L>41<pc>1-0006<k1>aMsay<k2>aMsay
{#aMsay#}¦, {#aMsa/yati#} 
<div n="1"> 1) {%theilen%}, 
<ls>KAVIKALPADR.</ls> im <ls>ŚKDR.</ls>; vgl. {#aMSay#} . 
<div n="1">— 2) {%schlagen, kämpfen%} ({#samAGAte#}) 
<ls>WEST. Dhātup. § 35, 64.</ls>

<div n="p">— {#vi#} {%theilen, brechen, unschädlich machen, abwehren%}: {#SaktiM vyaMsitAM mADavena#} 
<ls>MBH. 1, 197.</ls> {#vyaMsayAmAsa taM tasya prahAram#} 
<ls>3, 11728.</ls>
<LEND>

So the general conversion is -<P>- to <div n="p">—.

Note that the prefix in question generally appears just after the division markup (space + {#X#}), so that it should be easy to pull out the prefixes for a given verb as a first step in generating extra prefixed verb headwords (such as vi + aMsay -->[Sandhi] vyaMsay).

Vgl. divisions

It seemed to me that there is a common pattern which should be marked as a division, although the original coding did not provide this.

<L>31636<pc>3-0487<k1>dakziRasTa<k2>dakziRasTa
{#dakziRasTa#}¦ ({#da° + sTa#}) <lex>adj.</lex> {%zur Rechten stehend%}; <lex>m.</lex> {%Wagenlenker%} 
<ls>AK. 2, 8, 2, 28.</ls> 
<ls>H. 760.</ls> 
<div n="v">— Vgl. {#savyezWa#} .
<LEND>

Vgl. is an abbreviation:

Yes, vgl. (with an L) is a common abbreviation for vergleiche (compare). I believe the English equivalent 
is cf. (abbreviation of Latin confer, sometimes also used in German).J

funderburkjim commented 6 years ago

Other conversion details

lex tag

•f. becomes <lex>f.</lex>. Also for m. , n. and adj..

lang tag

Various language tags are all changed to the <lang> tag used in recent conversions of other dictionaries. Here's a summary

<g>X</g> -> <lang n="greek">X</lang>
 <R>X</R> -> <lang n="russian">X</lang>
  <A>X</A> -> <lang n="arabic">X</lang>
  <OH>X</OH> -> <lang n="oldhebrew">X</lang>

More special cases

  Replace ellipsis character … with space
  Replace -- with em-dash
  <sic>  with blank  (1 time: Klätscherei<sic> L = 45429, hw = piSunatA

`<ab>` tag

There are many, many abbreviations used in pwg. Although the new version of the digitization marks almost NONE of these, provision has been made for this. I'll make a separate 'enhancement` issue to discuss this further.

funderburkjim commented 6 years ago

xml markup

With the new form of the pwg.txt digitization, the xml form pwg.xml is only modestly different than pwg.txt. Here are the main differences

Meta-line

The meta line elements get converted to xml elements as follows:

<L>X -> <L>X</L>
<pc>X -> <pc>X</pc>
<k1>X -> <key1>X</key1>
<k2>X -> <key2>X</key2>
<h>X -> <hom>X</hom> (if homonym present)

body

The non-meta lines are put into the <body> element of the xml. The only conversions are:

{#X#} -> <s>X</s> for Devanagari text, in SLP1 transliteration
{%X%} -> <i>X</i> for italic text.
<div> tags are open in pwg.txt; appropriately placed closing tags </div> are generated for pwg.xml.
& -> & this is a requirement of the XML coding protocol.

n attribute of `<ls>`

This is the biggest difference between pwg.txt and pwg.xml. Where possible, we convert <ls>X</ls> to <ls n="Y">X</ls> ; Y is the Cologne id for the literary source, as currently determined by a particular master file pwgbib.txt. This assignment simplifies the generation of tool tips for literary source elements in the displays of pwg. We may at some time choose to have these Cologne id's (Y) as part of the pwg.txt markup, but I think it is premature to do so now.

funderburkjim commented 6 years ago

Display features

Many of the changes to pwg.txt do not show up as differences in the html displays for pwg, since these changes were generated by (a) the conversion of the former digitization to xml and (b) by the logic that constructed html from the former xml.

However, there are a couple of display-visible differences:

`<ls>` tooltip

It was recently suggested by Marcis and others that there were some browser problems in the use of links into popup windows as a technique for showing the user the expansion of the literary source abbreviations (this was relevant to the MW, PW and former PWG displays, which are the dictionaries where literary source markup is present).

Thus, as an experiment, I changed the display system for PWG to show the literary source expansions as tooltips. The system seems to work fairly well, although there are some funky details of the tooltip display that may need attention before we apply this technique to MW and PW. In addition, there are some details regarding the content of the tooltips that need attention. I'll discuss this more fully in the separate issue regarding enhancements to PWG ls system.

It is quite tricky to convert the capitalized text of the literary source into a form that shows larger and smaller capital letters in the display. Generally, the current display does this adequately, but there are a few variations from the printed text; for instance in the ls abbreviation H. an., the display shows an. in small caps., rather than in lower case. It is probably more trouble than it is worth to alter this detail of the display.

Also, in the previous version of PWG display, an attempt was made to mimic the size differences in the number sequences of an ls entry. I judged this attempt to have too many flaws, and to be too difficult to do properly; and thus omit this flourish in the current display.

`<lex>` tag tooltip

Tooltips are displayed for the elements (m,f,n,adj) marked with the <lex> tag.

`<ab>` tag tooltip

There is provision in the displays for tooltips for <ab> markup; but, as mentioned, there is almost none of this markup currently present.

IAST 'wide' text

Elements of form <is>X</is> are displayed in a way similar to the printed text, by using the letter-spacing CSS feature. Specifically: <is>X</is> -> <span style='letter-spacing:2px;'>X</span>. To my eye, this 2px spacing is quite close to the printed text.

divisions

Divisions (<div n="1">X</div>) are indented, much as before. The indentation increases as n goes from 1 to 3; when n=v or p, there is no indentation.

funderburkjim commented 6 years ago

This ends the comments that come to mind on the general features of the conversion. Comments definitely welcome, as usual.

Additional issues will provide more details regarding <ls>, <is> and <ab> markup, and areas where further work can improve this markup.

gasyoun commented 6 years ago

it might be better for the new form to have the period within the scope of the tag here

makes sense. What a tremendous work!

I think the prevailing hierarchy principle is : Numbers > Letters > Greek

Agree.

should be easy to pull out the prefixes for a given verb as a first step in generating extra prefixed verb headwords (such as vi + aMsay -->[Sandhi] vyaMsay).

The only big feature I lack myself badly left.

It seemed to me that there is a common pattern

Agree on Vgl. = compare.

We may at some time choose to have these Cologne id's (Y) as part of the pwg.txt markup, but I think it is premature to do so now.

Agree.

an. in small caps., rather than in lower case. It is probably more trouble than it is worth to alter this detail of the display.

Agree, not worth the trouble in 2018.

previous version of PWG display, an attempt was made to mimic the size differences in the number sequences of an ls entry. I judged this attempt to have too many flaws, and to be too difficult to do properly; and thus omit this flourish in the current display.

I disagree. It did help a lot not getting lost. I would want to see it as it is still possible, Jim. It's brilliant even as it was.

To my eye, this 2px spacing is quite close to the printed text.

Indeed, but I would go for a CSS class and not just hard coding. But let it be, it's just the puritan in me. Because hard coding was old school even in 1999, the year I launched my 1st website. And by the fact - I'm in St. Petersbourg righ now, just in a few hundert metres away where the Dictionary was printed.

funderburkjim commented 6 years ago

@gasyoun Thanks for feedback, Marcis.

I'll take a look at mimicing the size differences in the number sequences of an ls entry again sometime. Bug me about it in a few months if it still comes up.

Is there some memorial in St. Pet. that identifies the spot where PWG was printed?

Right now, it's more convenient to imbed styles in disp.php, since there are different CSS files for the different displays (Basic, List, etc.). So by putting in disp.php, all displays get the benefit. Otherwise I'd have to change multiple css files. Not bragging about this arrangement for sure, but that's the way it is now.

Curious of your opinion on use of tooltips for LS references, rather than link to popup.

funderburkjim commented 6 years ago

The list-0.2s display now also is based on the new form of data for pwg.

gasyoun commented 6 years ago

list-0.2s display

Time to make it public?

funderburkjim commented 6 years ago

Added a link to list-0.2s on home page.

Needs documentation. Hint Hint!

funderburkjim commented 6 years ago

User comment re display details:

User Odile Caujolle made a comment regarding the popup LS references in MW, and I asked her to review the tooltip version in PWG. While she apparently liked the tooltip aspect, she made these suggestions regarding other details of the display:

yes, but i must say that i find the display very illegible ...
the gray in between the black is hardly legible, and
 there is no underligning to inform that we can get some information for it. 
I would suggest to keep the flashing blue,   [the bright blue underling of LS sources in MW]
restore the underlining, 
but keep the little capitals and font

MW coloration:

PWG coloration:

What do others think?

gasyoun commented 6 years ago

I would suggest to keep the flashing blue, [the bright blue underling of LS sources in MW] restore the underlining, but keep the little capitals and font

Can only agree.

funderburkjim commented 6 years ago

Is this ok? (example from basic display for pwg)

gasyoun commented 6 years ago

Is this ok?

Blue is blue, but the reference sizes are gone and we sure want to see them back, as bad as they are - they give a visual hint.

SergeA commented 6 years ago

Tool-tips are good for abbreviations, and are not so good for the sources. For me the great benefit of the pop-up window for sources is the possibility of easy copying of the text. With the tool-tips this copying becomes impossible. Is it possible to make them tool-tipped but also keep clickable for pop-up window? So the copying functionality will not be lost.

The pop-up works fine in my FireFox, but in Chrome the line position is sometimes slightly misplaced.

funderburkjim commented 6 years ago

copying tooltip text

The pwg display example uses the default tooltip, so behavior is governed by the browser's internal (and not modifiable) behavior.

The jQueryUI Tooltip widget provides for customization. After half an hour of research, I found no immediate customization that permits copy-pasting from the tooltip text, but I suspect that this could be done.

Also, Bootstrap has tooltip functionality that might be customizable in this way.

An example from wikipedia

Look at this example (from Vedic Sanskrit article).

If you hover over one of the superscript numbers, you get a nicely formatted little 'tooltip' and you can move the mouse into it and copy/paste from it.

This looks like a very nice solution to me. What do you think?

SergeA commented 6 years ago

This looks like a very nice solution to me. What do you think?

Yes, this works.

gasyoun commented 6 years ago

This looks like a very nice solution to me.

Indeed. A copy-pastable tooltip would be a solution.

funderburkjim commented 6 years ago

OK. I'll put this on todo list.

It remains to know how to extract this particular piece of web technology so that it can be applied in the context of Cologne display functions.

@gasyoun
Do you have any contacts who could learn how the wikipedia tooltip technique works? Is it all done in some Javascript library? or is it done via a PHP extension to the mediawiki software that runs wikipedia? What we need is a small self-contained example.

gasyoun commented 6 years ago

Jim, let me explore.

gasyoun commented 5 years ago

@artforlife any idea?

drdhaval2785 commented 3 years ago

After sending the only remaining item to #321, this issue is safe to close.

sanskrit-lexicon / COLOGNE