sanskrit-lexicon / SKD

Discussion of corrections and other issues pertaining to Sabdakalpadruma dictionary at Sanskrit-Lexicon
0 stars 0 forks source link

Metaline <pc> not containing the column data #15

Closed Andhrabharati closed 2 years ago

Andhrabharati commented 2 years ago

It is seen that SKD metalines do not have the 'c' data in 'pc' field!

As @drdhaval2785 says he regularly consults SKD, he might be interested to incorporate the values with a small script; it is somewhat time-consuming to look for the entry word in the page (esp. at the online scans), without the column indication.

Or @funderburkjim might do this himself.

Here is the corresponding necessary data generated from the text file itself-- SKD pc values in metalines.txt

funderburkjim commented 2 years ago

This is similar to the pc (page-column) errors in md.txt (refer: https://github.com/sanskrit-lexicon/MD/issues/7).

@AnnaRybakovaT This would be a good next project for you, if time permits before you leave for the season. What do you say?

Andhrabharati commented 2 years ago

@funderburkjim you might wish to 'make' the posted file to be in uniform form throughout (I did not do it myself deliberately!), before using it for replacement by @AnnaRybakovaT

funderburkjim commented 2 years ago

@Andhrabharati Looks uniform to me. Where is it not uniform?

Andhrabharati commented 2 years ago

The 2nd column is not having the <pc> tag except for the initial 200+ lines (out of 42K+ lines).

Either it should be present throughout, or absent everywhere.

funderburkjim commented 2 years ago

ok. We can work around that difference

Andhrabharati commented 2 years ago

Here is the updated uniformly 'constructed' version of the above file-- SKD pc values in metalines.txt

AnnaRybakovaT commented 2 years ago

This would be a good next project for you, if time permits before you leave for the season. What do you say?

I agree))

funderburkjim commented 2 years ago

@AnnaRybakovaT I'll get instructions for you on this soon. In the meantime, do some reading on Python dictionaries, since this will be useful in the solution of this improvement that @Andhrabharati has set for us.

An introduction to Python dictionaries: https://www.w3schools.com/python/python_dictionaries.asp.

As usual, our programs use only some of the features of dictionaries. You can try the following in an interactive session (python -i) or in a test program (temp.py)

d = {}  # initialize an empty dictionary
d['a'] = 0   # set 'a' to be a *key* of the dictionary with value 0
d['a'] = d['a'] + 1  # set the value of the dictionary at 'a' to be 1 more than it was
if 'b' in d: # test if 'b' is a key of dictionary
  print("'b' is a key of d")
else:
 print("'b' is not a key of d")

keys = d.keys()  # gather all the keys of dictionary
for key in keys:  # loop over the keys
 print("value of d at key %s is %s" %(key,d[key]))

Small exercise: write a program that takes a string (such as the dog and the cat sat in a hat) and writes a count of each letter appearing in the string. e.g.

t 5
h 3
etc.

Use a dictionary.

You might vary your program by using the Python 'sorted' function so that the letters print in alphabetical order.

funderburkjim commented 2 years ago

@AnnaRybakovaT Please clone this repository (https://github.com/sanskrit-lexicon/SKD).

I've made a stub directory (corrections/issue15/) where our work can go. Thus far, there is only a brief readme.txt file in issue15 directory.

This project seems very similar to the MD pc errors project

Rather than having me set up the project, why don't you give it a try ? Bring over to the issue15 directory as much of the material from https://github.com/sanskrit-lexicon/MD/tree/master/deva_iast_comp/step2a as needed. And begin the process of altering the programs and readme, etc. to fit our issue 15 task.

Get as far as you can, then push your revised skd repository, and formulate questions where you might get stuck.

AnnaRybakovaT commented 2 years ago

Please clone this repository

Dear Jim, Thanks so much for your detailed instruction! I need some days before to start this task. In some days we have to open our tourist shops but they are still not ready. From morning and untill late evening I do preparations and when I come home my brain and my body protest to do anything. So I prefer to focus now only on the shops to finish this work as much sooner.

AnnaRybakovaT commented 2 years ago

Get as far as you can, then push your revised skd repository, and formulate questions where you might get stuck.

Dear Jim, I am here again. Sorry for so long "some days", to be honest today I have the first free evening from the beginning of this month. Unexpectably our island have a lot of tourists and every working day after 2 diffucult years is important.

So I suppose we need:

Just now I can't push the revised skd repository:

Rybakova@ST-Rybakova MINGW64 ~/Documents/sanskrit-lexicon/SKD/corrections/issue15 (master)
$ git push
remote: Permission to sanskrit-lexicon/SKD.git denied to AnnaRybakovaT.
fatal: unable to access 'https://github.com/sanskrit-lexicon/SKD/': The requested URL returned error: 403
funderburkjim commented 2 years ago

@AnnaRybakovaT Hi! Nice to hear from you. Which is 'our island'?

Try push again, think it should work for you now.

Andhrabharati commented 2 years ago

She lives somewhere in 'Greece', as I understand, @funderburkjim!

Andhrabharati commented 2 years ago

I am here again. Sorry for so long "some days", to be honest today I have the first free evening from the beginning of this month.

In fact, you had returned much sooner @AnnaRybakovaT (your another post elsewhere mentioned your return would be sometime after next November)!

AnnaRybakovaT commented 2 years ago

Try push again, think it should work for you now.

Yes, it works!

I live on Patmos, it is tiny beautiful island with 3000 local inhabitants and 20 000 tourists during summer time. I am Russian, I was living in Moscow but 6 years ago I had vacation on this island and met my future husband for whoom life in Moscow was absolutely impossible, so I had to move to Greece. By this way I have opened a new page of my life 100% diffrent in comparrison with previous.

AnnaRybakovaT commented 2 years ago

(your another post elsewhere mentioned your return would be sometime after next November

It is true but before I would like to finish this current task.

funderburkjim commented 2 years ago

@AnnaRybakovaT

These suggestions may help you get started.

get temp_skd.txt

Get latest version of skd.txt from https://github.com/sanskrit-lexicon/csl-orig/blob/master/v02/skd/skd.txt, rename the file as temp_skd.txt, and move temp_skd.txt into this skd/corrections/issue15 directory

You can use the 'download' button on the page above, or use the following 'curl' command

curl https://raw.githubusercontent.com/sanskrit-lexicon/csl-orig/master/v02/skd/skd.txt -o temp_skd.txt

start modifying program

test_make_change_pc.py is the program previously used.

The command to run this program will be python test_make_change_pc.py temp_skd.txt SKD.pc.values.in.metalines.txt changes.txt

First, put an 'exit(1)' statement after 'entries = digentry.init(filein)', and run the program. It should properly read in temp_skd.txt into an array of entries.

modify Pcerror class

The init method of the class parses a line of SKD.pc.values.in.metalines.txt. Our first line is

<L>1    <pc>1-001   [1-001-a]

We need three attributes for the instances of our Pcerror class. Let's call the attribute names L, pcold, and pcnew For the first line, the values will be strings 1, 1-001, and 1-001-a Design a regex to do the parsing m = re.search(regex,line) self.L = m.group(1), etc.

Also, in init_pcrecs, remove the dbg statements (they are not relevant now). Finally, move the exit(1) statement to go after 'pcrecs = init_pcrecs(...)' and rerun your program.

When this part is working properly, we'll be ready to modify generate_changes.

Note: Keep readme.txt up to date.

funderburkjim commented 2 years ago

@AnnaRybakovaT Haven't heard from you for a while.
Are you waiting on me, or just busy with other things?

AnnaRybakovaT commented 2 years ago

@AnnaRybakovaT Haven't heard from you for a while. Are you waiting on me, or just busy with other things?

Dear Jim, I am sorry again for my pause, I will be back in a week.

Next week I am taking part in an annual conference of Oriental studies in St.Petersburg (of course by Zoom) and I have to focus all free time for preparation of my presentation. Since now such scientific work is not my professional field and during the last 4 years I lost a lot of skills, it takes for me much more time just to write one article or to prepare one speach. In any case it is a big pleasure for me - I have possibility to learn something new (a topic of my research is Nepal) and still keep connections with Russian oriental studies' community.

AnnaRybakovaT commented 2 years ago

Haven't heard from you for a while.

Dear Jim, I hope you will excuse me, I have disappeared again. I had brought my laptop to the store around 2 weeks ago (since mycurrent schedule is 10 am - 10 pm in the store without day-off and soon it will be untill midnight) and I expected to work a bit from the store but only today I have managed to switch on it. I notice that with every day we have more and more clients in the shop so it is why I would like to ask you to continue this current task after our tourist season. I feel so sorry that I can't complete this task now, every day I am thinking about this unfinished deal. To be honest I hoped to do even something today, but just during last 15 min I had to pay attention to people who were inside of the store and I realize that in such conditions it is imposibble to focuse om something else. So I hope we can continue this task in the autumn. If everything is fine I will come back in November or December.

funderburkjim commented 2 years ago

@Andhrabharati Went ahead and did this correction. Enjoy! Old: image

new: image

funderburkjim commented 2 years ago

@AnnaRybakovaT When you get back to this, the solution may be of interest. I used a 'slow', but conceptually simple, linear search to match records in SKD.pc.values.in.metalines.txt to entries in skd.txt.

A good next learning step would be to replace this linear search with a much faster python 'dictionary' lookup.

Andhrabharati commented 2 years ago

@Andhrabharati Went ahead and did this correction. Enjoy!

As I had mentioned elsewhere, I rarely refer to SKD; so nothing much to enjoy for me.

@drdhaval2785 might feel it so, probably.

drdhaval2785 commented 2 years ago

Yes. I do use SKD. Any improvement there is useful to me.

gasyoun commented 2 years ago

Any improvement there is useful to me.

What major improvements still lacking?

AnnaRybakovaT commented 1 year ago

I used a 'slow', but conceptually simple, linear search to match records in SKD.pc.values.in.metalines.txt to entries in skd.txt.

Dear Jim and dear all, Hopefully you are fine.

I have got back!!!))) Fortunately this tourist season in Greece has been quite long, busy and successful. Now I have a "winter" break for a few months. Shall I focus on this issue or you prefer me to do any other tasks?

funderburkjim commented 1 year ago

@AnnaRybakovaT Hi, Welcome back!

AFAIK, this issue has been resolved : all the metalines for skd now have column designation, like the '-b' in <L>25050<pc>3-511-b<k1>BikzukaH<k2>BikzukaH.

I have a non-programming task regarding quality of pdf images for mw dictionary. Will open a new issue and describe it further as exercise for you. I aim to do this soon, but my mind is elsewhere at the moment,

Andhrabharati commented 1 year ago

A good next learning step would be to replace this linear search with a much faster python 'dictionary' lookup.

@funderburkjim I think Anna would like to learn this.

And I can arrange the replacement for MW 'bad' pages from my physical copy (London print), if required.

Also I have made a better copy from the archive version of the pdf, from its unprocessed (uncompressed) image (jpeg2) pages, if it sounds interesting.

Andhrabharati commented 1 year ago

Speaking of having the better scan pages, how about replacing the present SKD scans with the excellent SKD scans shared by Thomas recently (appropriately downsized)? https://github.com/sanskrit-lexicon/SKD/issues/14#issuecomment-1086779841

AnnaRybakovaT commented 1 year ago

I have a non-programming task regarding quality of pdf images for mw dictionary. Will open a new issue and describe it further as exercise for you.

Excellent! I am waiting for further details.

gasyoun commented 1 year ago

Speaking of having the better scan pages, how about replacing the present SKD scans with the excellent SKD scans shared by Thomas recently (appropriately downsized)?

@funderburkjim I believe there is no reason why not?

funderburkjim commented 1 year ago

retract mw scan review?

In reviewing my notes regarding alleged bad scan pages for MW, I had noted about 10 such instances. However, it seems all but two of these were NOT bad scans! Maybe I was hurrying and misidentified. The only 2 of those 10 that do need replacement are

Thus, we should find replacements for these 2 pages. But maybe a review of ALL mw pages is not a good use of time for @AnnaRybakovaT.
@Andhrabharati . What is your opinion?

funderburkjim commented 1 year ago

skd scan replacemenet

@Andhrabharati - If you have a replacement of all the scans, I can develop instructions for you to get those in a form which I can easily install into sanskrit-lexicon web site.

funderburkjim commented 1 year ago

all dictionaries needing better scans.

I know there is at least one dictionary that could use better quality scans - GRA (Grassman). A few of sanskrit-lexicon scans for GRA have missing data (e.g. page 115), and many pages are of marginal quality and highly skewed. It would be nice to have good quality scans for the entire dictionary.

There are many of the Cologne dictionaries whose scan quality is unknown to me. I think it would be good to review scans for all of the dictionaries, evaluate the quality, and determine whether a few scans or all of the scans should be replaced; and then find good replacement scans and go through process of installing these at Cologne. As with GRA, it would be nice to be assured that the scan links are as good as possible for the dictionaries covered by the Cologne sanskrit-lexicon.

Taking the lead in such a comprehensive review could be a valuable contribution for @AnnaRybakovaT.

Request comments by others.

Andhrabharati commented 1 year ago

But maybe a review of ALL mw pages is not a good use of time for @AnnaRybakovaT. @Andhrabharati . What is your opinion?

I feel spending a few hours (3-4) [or even a day or two] is not a bad deal, as it would eliminate any bad scans (if present) in the repo.

Andhrabharati commented 1 year ago

I know there is at least one dictionary that could use better quality scans - GRA (Grassman). Request comments by others.

I do have a good scan of GRA (1873) from Bayerisch Stattlib, and also a revision of the work by Maria Kozianka (1996). This revision is similar to the revision/update of Bloomfield's Vedic Concordance; and incidentally both these revisions took place about a century later (wrt to the original editions)!

It would be a plausible option to update Cologne's GRA with this 1996 work-- @thomasincambodia and @funderburkjim may ponder on this suggestion.

Finally, isn't it better to talk about this scan pages matter at a 'new' issue, instead of here at this 'closed' issue?

Andhrabharati commented 1 year ago

skd scan replacemenet

@Andhrabharati - If you have a replacement of all the scans, I can develop instructions for you to get those in a form which I can easily install into sanskrit-lexicon web site.

Yes, I have stored the SKD scan pages from Thomas and can do the needful. [The pcloud link, where these were shared earlier by Thomas, is expired now.]

gasyoun commented 1 year ago

It would be a plausible option to update Cologne's GRA with this 1996 work-- @thomasincambodia and @funderburkjim may ponder on this suggestion.

It will become a copyright issue.

good scan of GRA (1873) from Bayerisch Stattlib, and also a revision of the work by Maria Kozianka (1996). This revision is similar to the revision/update of Bloomfield's Vedic Concordance; and incidentally both these revisions took place about a century later

Interesting note

I feel spending a few hours (3-4) [or even a day or two] is not a bad deal, as it would eliminate any bad scans (if present) in the repo.

4 hours would not be enough to even open all these pages.

Request comments by others.

We could try.

funderburkjim commented 1 year ago

@AnnaRybakovaT Instructions for reviewing mw scans posted at https://github.com/sanskrit-lexicon/MWS/issues/144

funderburkjim commented 1 year ago

Discussion regarding skd scans is moved to https://github.com/sanskrit-lexicon/SKD/issues/16.

funderburkjim commented 1 year ago

python dictionary learning

Anna had made some progress with Python. In comments above, apparently next reasonable step in Python study seems to be the python 'dictionary' data structure.

@AnnaRybakovaT Are you still interested in furthering your Python skills?

If so, there are many online resources that can get you started with Python dictionaries. I think the w3schools Python dictionary material is a good starting point.

funderburkjim commented 1 year ago

thomas new github name

Thomas changed his github name to @maltenth .
@thomasincambodia no longer present.

AnnaRybakovaT commented 1 year ago

Are you still interested in furthering your Python skills?

Dear Jim, Thanks for so kinds words. Yes, I am interested in this skill. After the mw scan task I can study more about 'dictionary' data structure, in case if you will not offer me another more actual issue.