sanskrit-lexicon / literarysource

A repository to study literary sources in CDSL and find their scanned copies to link in the display program.
2 stars 0 forks source link

MW72 literary sources listing #1

Open drdhaval2785 opened 2 years ago

drdhaval2785 commented 2 years ago

https://www.sanskrit-lexicon.uni-koeln.de/scans/csldev/csldoc/build/dictionaries/prefaces/mw72pref/mw72pref18.html and https://www.sanskrit-lexicon.uni-koeln.de/scans/csldev/csldoc/build/dictionaries/prefaces/mw72pref/mw72pref19.html give the list of works consulted by MW in preparation of MW72 dictionary.

That list is digitized preliminarily. https://github.com/sanskrit-lexicon/literarysource/blob/main/mw72/mw72_ls.tsv is the file.

I would request @Andhrabharati or any other interested member to have a look at the file and help in two ways.

  1. Correct the digitized data. Especially French and German are difficult for me.
  2. Search the internet for good quality data for the above books and note them in the Links column.
Andhrabharati commented 2 years ago

I can surely look into the data and correct.

And my suggestion is to combine all the references across all CDSL dictionaries (rather, works) at one place and then put the links to scans or texts there. This would eliminate duplicating the same link at multiple places.

As such, one issue to each work for its own ls listing could be opened and filled up first.

Let this be a batch processing, of one task across all the works at once (instead of doing all tasks for each work and then taking up another work).

My task list would be-

  1. ls listing (and probably marking inside the text)
  2. expanding the ls entries to full form
  3. ab listing (and probably marking inside the text)
  4. expanding the ab entries to full form
  5. ...

I can go on filling up all such tasks as steps, but would it be considered by the team?

Andhrabharati commented 2 years ago

already assigned this to me?!!!

drdhaval2785 commented 2 years ago

I agree with your suggestion of gathering references across CDSL dictionaries and putting links at a central space for all CDSL dictionaries.

drdhaval2785 commented 2 years ago

Yes. Already assigned to you.

Andhrabharati commented 2 years ago

@drdhaval2785

How did you generate this tsv file? The order is not as per the book's listing, but seems to be ordered by the author/compiler.

I recall that all the earlier ls listings (PWG, pwk, MW, ...) are as per the book's sequence. Wonder why this re-ordering is done for MW72.

If you have that initial (original) list, please post it; so that the work would be easy.

Andhrabharati commented 2 years ago

Here is the list as per the book- mw72_ls (AB).txt

Notes:

  1. When two persons have jointly done a work, they are referred either jointly or individually; hence they are to be separately marked.
  2. ऋ is taken as ṛ (as per CDSL's IAST); but ष is marked as sha (not as ṣa) in the above; probably consistent 'theme' is to be used throughout.
gasyoun commented 2 years ago

Yes. Already assigned to you.

))

drdhaval2785 commented 2 years ago

Replaced my list with your list. https://github.com/sanskrit-lexicon/literarysource/blob/main/mw72/mw72_ls.tsv now has data mentioned in https://github.com/sanskrit-lexicon/literarysource/issues/1#issuecomment-1043338138

drdhaval2785 commented 2 years ago

Regarding 'ls' marking inside text and 'ab' marking inside text, I would just like to caution that there are going to be collateral damage. There would be certain textual items not intended as 'ls' or 'ab' and just wrongly converted to 'ls' or 'ab' marking because of regex changes. How do you propose to handle such cases, @Andhrabharati ? If you have some checks and balances in place, this would indeed be a great enhancement.

Andhrabharati commented 2 years ago

Pl. delete 'a' in Burnaouf, at entry 30.

It skipped my attention earlier.

Andhrabharati commented 2 years ago

My version has all such properly marked.

I was thinking of sending my 'updated' verson, once @funderburkjim does his work on Greek filling in BUR, from my data.

Now my BEN file(s) have all the Greek strings filled (which are to be filled in CDSL text in the same way that Jim did in INM, as many mergers or deletions are there) and also corrections in various cognate language words.

Andhrabharati commented 2 years ago

Sorry that my prev. post has landed at a wrong place.

It is to be in the BEN issue.

Andhrabharati commented 2 years ago

How do you propose to handle such cases, @Andhrabharati ?\nIf you have some checks and balances in place, this would indeed be a great enhancement.

I am careful enough to handle all such properly, though I can't put my 'processes' in words (so that other could adopt them).

drdhaval2785 commented 2 years ago

Ok. It would be great if you can put even some part of your thought process in words, so that we can benefit from the same. And if someone down the line wants to revert back, he knows what requires reversal.

Andhrabharati commented 2 years ago

Now that you're talking about inside marking as I mentioned above, why not add the ab list as well-- along with ls list for each work?

Many works already have these portions digitised, but 'lying' within the 'full' txt file (as I mentioned elsewhere), including this mw72.

Probably a first step would better be to split all those files into resp. sections as I did. Looking at my INM file, Jim seems to have split the cologne's INM file (but kept somewhere else, but not in csl-orig)

drdhaval2785 commented 2 years ago

I have no objection to listing ab files for all dictionaries. Would you like to keep them in this repository or do you want me to create a new repository for analysing ab lists?

Andhrabharati commented 2 years ago

I would suggest keeping both ls and ab lists in a single place.

And you may start populating these lists, picking up from the digitised text files (whichever is available). [I do not 'respect' the CDSL line-breaks or format, so its better that I do not put my fingers in the first step itself.]

drdhaval2785 commented 2 years ago

Sure.