sanskrit-lexicon / MD

Research re Macdonell Sanskrit-English Dictionary
0 stars 0 forks source link

Abbreviation tooltips #11

Open funderburkjim opened 1 year ago

funderburkjim commented 1 year ago

This issue devoted to initial creation of a file from the 'LIST OF ABBREVIATIONS' provided in the front matter. This was discussed here.

image

funderburkjim commented 1 year ago

@AnnaRybakovaT

Hi!

Instructions for first step prepared for you here,

Let me know when you see this note.

Andhrabharati commented 1 year ago

@funderburkjim

You guys need to keep this also in mind, to add at the end of the abbr. markup.

Andhrabharati commented 1 year ago

And then if required, I could finally step-in to identify the unlisted abbr.s present in the text, as I did an all other works.

funderburkjim commented 1 year ago

The particular task of this issue is to prepare an abbreviation tooltip file.

A next step would be to add markup <ab>X</ab> in md.txt.

Within that next step, (or possibly as a second next step), we can consider the 'asterisk'-related markup.

Andhrabharati commented 1 year ago

@funderburkjim

Here are the listed abbr.s in the MD print-- md-abbr.txt

As you might be somewhat free from pwk abbr. work now (till Thomas comes back to you), I am posting this file for your perusal and further action. [I would just like to state here that I got quite many (~60) other abbr.s from the MD text.]

Andhrabharati commented 1 year ago

It may be interesting to note that MD has employed both regular cap. N. and the small cap. ɴ. as abbr.s (the ɴ. being used 400+ times in the text, while the N. is present 4200+ times).

funderburkjim commented 1 year ago

@Andhrabharati acknowledged. You are right - I'll be holding off changes to pwk until Thomas finishes. I guess Anna is not available now. Agree that abbreviation markup for MD is needed.

Andhrabharati commented 1 year ago

If you're willing to use it, I can post my MD file, with many corrections (I just don't want to list them) incorporated.

It is something like the GRA file (from me) that you've used recently.

funderburkjim commented 11 months ago

abbrev1

<ab> markup applied, based on the list of abbreviations shown in first comment of this issue. Working directory: https://github.com/sanskrit-lexicon/MD/tree/master/mdissues/issue11

@Andhrabharati - What do you think should be done next? Are there differences between temp_md_1 and your version that you think the cdsl version should implement?

Andhrabharati commented 11 months ago

Are there differences between temp_md_1 and your version that you think the cdsl version should implement?

Yes @funderburkjim, there are hundreds of types of changes (corrections), ranging from Sanskrit spellings (and/or accents) [sometimes even in headwords], English spellings, Greek spellings, wrong tags (italic, bold and sanskrit), hyphens, brace matching, … … …

And most important of them all is "decoding" the numeral marking ¤X¤ into various types, that I had mentioned earlier in a response to your query.

Next comes the 'relocation' of the homonym numbers that you had inserted recently, to their 'proper' position as per MD print and intention!!

You may look at various addl. tags that I had used-- <ab></ab> <bot></bot> <cl></cl> <fr></fr> <gk></gk> <hom></hom> <lang></lang> <lat></lat> <lex></lex> <ls></ls> <pe></pe> <zoo></zoo>

Even if you would like to limit to abbr. markings, there are quite many yet to do, see for e.g. my extracted lists-- MD ab_local.txt MD ab_global.txt [And of course, there are many count differences as well between your version and AB version.]

funderburkjim commented 11 months ago

@Andhrabharati Would you upload your version?

Andhrabharati commented 11 months ago

Here is the file for your study/reference, @funderburkjim-- md_AB_v1.zip

--------------------------------------- And here is the file, with a kind of semantic line-breaks for the sub-HWs inside the entries [they are not always the composite words formed from the main HW, but most of the times 'siblings' containing the first portion of the main entry!!]-- md_AB_v2.zip [This is just done as a trial, not as a full (complete) work.]

gasyoun commented 11 months ago

Sanskrit spellings (and/or accents) [sometimes even in headwords]

headwords is what I value the most @Andhrabharati

AnnaRybakovaT commented 8 months ago

Instructions for first step prepared for you here,

Let me know when you see this note

Dear Jim and dear all, Glad to see you after some months break.

During summer time you have mentioned me in some topics (BHS Issue 4 and PWK Issue 95). Please let me know from what I should start now?

Regardind this current issue I can't clone the directory ( I a bit confussed - is it : github.com/sanskrit-lexicon/MD/mdissues/issue11 or github.com/sanskrit-lexicon/MD/tree/master/mdissues/issue11

in any case I had messages like:

fatal: repository 'https://github.com/sanskrit-lexicon/MD/tree/master/mdissues/issue11/' not found
---
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
drdhaval2785 commented 8 months ago

I think you will have to do

git clone https://github.com/sanskrit-lexicon/MD.git

Andhrabharati commented 8 months ago

@AnnaRybakovaT,

Jim has already done what he wanted you to do (as a first step) reg. the MD abbr.s; so you need not bother about the same now.

However, @funderburkjim is yet to take up further changes based on my posted file.

And the pwk-95 has nothing more to do; it has gone into many major changes in the later days.

As such, I think, you may now see if BHS-4 issue interests you, as Jim has suggested. But probably Jim might wish to put you on some other task; so, let's wait for his response.

AnnaRybakovaT commented 8 months ago

As such, I think, you may now see if BHS-4 issue interests you, as Jim has suggested. But probably Jim might wish to put you on some other task; so, let's wait for his response.

Great! Of course I would like to work with BHS. So I am waiting for the decision.

funderburkjim commented 8 months ago

@Andhrabharati -- thank you for taking an interest here. I would prefer for you to work with @AnnaRybakovaT on either this MD topic or the BHS topic (or both) (https://github.com/sanskrit-lexicon/BHS/issues/4) according to what your mutual interest suggests.

My part would then be limited to helping integrate your work into the active displays for MD, BHS.

If these tasks are completed while Anna is available, then we might consider involving Anna in an AP90 task -- developing an MW-style display so that the AP90 nominal compounds would be accessible directly.

Andhrabharati commented 8 months ago

@funderburkjim

I think, there is nothing more that Anna or I could do in this MD issue; it is ONLY you that can take-up further work based on my file(s) posted above.

@AnnaRybakovaT

Do you think you can proceed based on what Jim had suggested at BHS-4 (https://github.com/sanskrit-lexicon/BHS/issues/4), or need anything more?

Andhrabharati commented 8 months ago

On a 2nd thought [re-looking at Jim's posting], apart from AP90, I think my md_AB_v2.zip file can be worked upon, to make this MD another work having sub-HWs 'accessible' to online search-queries, after MW.

And, my BEN file (already posted long ago) also can be a similar candidate.

@AnnaRybakovaT do you think you could take-up this piece of work, to make the 'full' HWs from the 'partial' sub-HWs, by appropriately filling up the (presumed) beginnings? [Just browsing through my above file (look for <div/> tags) might give you some ideas!!]

AnnaRybakovaT commented 8 months ago

Do you think you can proceed based on what Jim had suggested at BHS-4 (sanskrit-lexicon/BHS#4), or need anything more?

As I see - I have to work only with the file tagcount_ls.txt In general I understood the task. Shall I make a copy of this file to make my changes in this copy?

AnnaRybakovaT commented 8 months ago

'full' HWs from the 'partial' sub-HWs, by appropriately filling up the (presumed) beginnings?

I am so sorry but I need more explanations regarding this task. First of all could you kindly show me exsamples in the file - what is:

Andhrabharati commented 8 months ago

As I see - I have to work only with the file tagcount_ls.txt In general I understood the task. Shall I make a copy of this file to make my changes in this copy?

That's correct; pl. go ahead.

Andhrabharati commented 8 months ago

'full' HWs from the 'partial' sub-HWs, by appropriately filling up the (presumed) beginnings?

I am so sorry but I need more explanations regarding this task. First of all could you kindly show me exsamples in the file - what is:

* 'full' HWs

* partial' sub-HWs

@AnnaRybakovaT We shall come back to this MD task, after the above BHS work is done.

Andhrabharati commented 8 months ago

@funderburkjim

Is it OK if we take-up the MD sub-HWs work before the AP90 (that you suggested)?

funderburkjim commented 8 months ago

Is it OK if we take-up the MD sub-HWs

Yes - I will put work on your versions of MD on my (nearby) TODO list.

funderburkjim commented 8 months ago

comments on version ab.v1

Here is a table summarizing the differences in tag markup

tag AB_v1 count cdsl count
\ 42 43
\ 32 179
\ 46925 100344
\ 155 0
\ 993 0
\ 1 0
\ 10 0
\ 1363 920
\ 102 0
\ 0 9
\ 0 12342
\ 3 0
\ 56122 0
\ 58 0
\ 314 0
\ 15 0

@Andhrabharati I think these differences account for most of the differences between the prior (cdsl) MD and the v1 version.

Andhrabharati commented 8 months ago

I don't think there are other tags etc.; but there sure would be spelling changes in my file wrt to cdsl version (as I had mentione previously). And, there are no alt. HWs, as I recall in MD.

If you're going to use my file as-is, then nothing more to 'study' in my file!! [BTW, I thought you'd be taking up the MD after our (Anna and me) finishing the sub-HWs task as well.]

I remember using the <pe> tag in PWG and <cl> tag in MW in my recent working. [However, both these are not yet posted out!]

funderburkjim commented 8 months ago

I'll go ahead and install v1, making changes in csl-pywork as mentioned. Thanks for comment reminding me about spelling corrections you made.

The 'v2' (sub-hws) task that you are working on needs to be in another issue, which I'll open soon. Our target should be like 'mw' -- Since we don't have any other 'sub-hw' dictionaries, we need to think carefully how to proceed. I'll elaborate on this when I open that other issue.

funderburkjim commented 8 months ago

corr. tooltip

Happen to notice in v1 <ab>corr.</ab> (30 instances). We need a tooltip. 'correlative' ? (In MD's published list, there is 'cor.' correlative) Do you recall any other items like corr. (marked as <ab> (global), but with no tooltip in MD's list?

funderburkjim commented 8 months ago

AB.v1 (with a few minor changes) now installed as cdsl version. temp_md_ab_1pe.zip has the changes, which I think you should incorporate in further versions.

Work done in abv1 directory.

Many new items added to the tooltips. See this commit [(0469541). There are still 3 <disp>??</disp> with completely unresolved tooltips. And several others with a ? in the tooltip where I was uncertain.

Andhrabharati commented 8 months ago

@funderburkjim

<cl> tag -- class of verb. <cl>X</cl> X is a roman-numeral

<cl>V.</cl> -> <cl>ᴠ.</cl> class 5 (33 - to avoid conflict with <ab>V.</ab> Vedict 2470 instances <cl>V.</ab> is class 5 root Change to use Unicode U+1d20 Latin Letter Small Capital V <cl>ᴠ.</ab>

I would suggest to go for unicode Roman numerals (U+216x), throughout for the dhAtu class-numbers; as only 1-10 such cases [Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, Ⅵ, Ⅶ, Ⅷ, Ⅸ, Ⅹ] are required, we'd have no issues. [Using a small capital letter ᴠ (U+1d20) [as above] is not a proper choice, as all other class numbers are in normal-size capital letters.]

Here is the updated file (hoping that you'd have no issue in agreeing to my proposal)-- md_AB_v1.zip

Andhrabharati commented 8 months ago

There are still 3 <disp>??</disp> with completely unresolved tooltips.

Are these the ones having the superscript numbers ¹ and ² ?

They stand for 'rare' (or singular) occurrences in the whole 'text' being referred.

See what MD says in his Preface (p. ⅸ),--

image

Also this snippet reminds you of the pending work that I had mentioned above, to which you had also responded that it would be taken up next.

As I had already pointed this out (as above) to you, I deliberately did not mark it thus in my later working.

funderburkjim commented 8 months ago

roman numeral revision.

@Andhrabharati accepted your revision re roman numerals. It is now installed. One correction:

At<L>19659<pc>358-2<k1>sfj
<ab>A.</ab> -> <lex>Ā.</lex>

Related revisions to mdab_input.txt in csl-pywork, md-meta2 in csl-orig. For details, see the commits above or the mdissus/issue11/abv1 readme.txt.

Please note the ¹ and ² tooltips in mdab_input.txt. I translated these as 'one instance' or 'two instances' in the tooltips.

The three unknown abbrevs show as <disp>??</disp> in mdab_input.txt.

funderburkjim commented 8 months ago

Also this snippet reminds you of the pending work that I had mentioned https://github.com/sanskrit-lexicon/MD/issues/11#issuecomment-1673624909, to which you had also https://github.com/sanskrit-lexicon/MD/issues/11#issuecomment-1673660186 that it would be taken up next. As I had already pointed this out (as above) to you, I deliberately did not mark it thus in my later working.

@Andhrabharati I prefer to omit further work on this (e.g. marking '*' as abbreviation, with associated tooltip). Feel free to add such markup in a future version (and DOCUMENT what you do).

Andhrabharati commented 8 months ago

The three unknown abbrevs show as <disp>??</disp> in mdab_input.txt.

Filled these, and also corrected a few other abbr.s-- mdab_input_AB.txt

Here is my updated file-- md_AB_v1.zip

And, just noticed that I did not change the 𝑃. (Purāṇa) occurrences inside the text file [it is rendered as a normal letter in the print, being within the italic string(s)!], though the abbr. list is having it thus. This shall be done while on v.2 (sub-HWs) work.

funderburkjim commented 8 months ago

@Andhrabharati -- In your file [md_AB_v1.zip] https://github.com/sanskrit-lexicon/MD/files/13768757/md_AB_v1.zip from prior comment.

I see only 1 difference (under #upa at line 18596). Is this what you intended?

Andhrabharati commented 8 months ago

Yes, while looking for nl. I've noticed the vb. px. here and thought it should've been vbl. px. (the upasarga)!

funderburkjim commented 8 months ago

revisions installed.

Yes, I agree with the 'vbl.' change under 'upa', added a print change note in csl-corrections Thanks for changes to mdab_input. I made a couple of changes to your changes, as noted in abv1/readme.txt at 12-26-2023 AB rev to mdab_input.txt. Or you can see via the commits above. I'm fairly sure that, in mdab_input.txt, 'sts.' is 'sometimes' (even though 'st.' is 'stem'!). I also used both 'absolute' (according to md print abbreviations) and 'absolutive' for 'abs.'

Andhrabharati commented 8 months ago

I'm fairly sure that, in mdab_input.txt, 'sts.' is 'sometimes' (even though 'st.' is 'stem'!).

Yes-- you're correct, @funderburkjim ; I did not pay proper attention to the file content.

----------------------------------

  • In particular, is there any 'alternate headword' markup that remains to be done in v1 ?

And, there are no alt. HWs, as I recall in MD.

I was wrong here as well!! Seen now, that the text has quite many alt. HW candidates; but these all need to be 'marked' yet [like in the GRA and pwk]. Do you think this part could be done now, or along with the sub-HWs task sometime later?

[I also noticed that I had missed quite many nuances in the text earlier. Too bad of me, that I did not put my mind properly in the MD work.]

gasyoun commented 8 months ago

quite many alt. HW candidates

Interesting to know if any unique ones, as compared to other dictinaries, @Andhrabharati

Andhrabharati commented 8 months ago

I recall MD being helpful in resolving an issue or two while on MW two years back; no other work had those words!!

And yes @gasyoun, I think we might get few interesting entries from MD, if fully worked upon.

And I have noticed many entries in VCP, which are not anywhere else!!

gasyoun commented 8 months ago

noticed many entries in VCP, which are not anywhere else

of utmost interest such your remarks. Would love to call you tomorrow and talk about what is your vision on future on Sanskrit dictionaries, what is done, what is important and what will remain to be done for generations ahead. @drdhaval2785 @funderburkjim @AnnaRybakovaT how about a call on 4th of January 20:00 Moscow time?

drdhaval2785 commented 8 months ago

I will be able to spend time from 20:00 to 22:00 Indian Standard Time. 20:00 Moscow time will be too late in night for me.

gasyoun commented 8 months ago

I will be able to spend time from 20:00 to 22:00 Indian Standard Time. 20:00 Moscow time will be too late in night for me.

I'm ready for a call 20 Indian Standard Time as well. Wrote in our sanskrit-lexicon Skype group.