PWK VN7 Schmidt - Githubissues

funderburkjim commented 2 years ago

pwinfo_edit.txt is a to generate partial simulation the Volume7 index of PWK.

pwinfo.txt is a programmatically generated precursor, based only on schmidt. It is partial in two aspects:

The index has entries that are referential to VN1-6, and entries that are full VN7 additions/corrections. For the VN7 entries, the text from schmidt is excluded. Thus, the purpose of pwinfo is to provide a coverage comparison.
Entries in VN7 have, in addition to the headword and occasional homonym number, two other data:
- A Roman numeral indicating the volume (1-7, I-VII) containing the headword in the body of PW; this Roman numeral is absent if the headword is new
- An Arabic number referencing the volume whose VN has the actual correction or addition text. This is 1-6; 7 is implicit if the correction is part of the index text.

The Roman numeral can be derived programmatically by looking up the Schmidt headword in the PW index of headwords. If the PW headword is found, then the metaline for that headword provides the Volume of the headword, and is printed just after the headword in pwinfo listing. If the SCH headword is not found in PW, then an underscore '_' is printed in pwinfo.

However, the Arabic numeral cannot definitely be found. The number appearing in pwinfo.txt is an estimation of the VN group, based on the first letter of the headword. (This is reasonable, since PWK 1-7 headwords are alphabetic. Volume 1 has a-O (slp1 spelling), Volume 2 k-Q, etc.). However, VN2 contains some additions/corrections for headwords beginning with 'a' (for instance आंशुमद्भेदसंग्रह at line 12 of pwinfo). Note that line 12 of pwinfo.txt has '1' estimate; but the same line of pwinfo_edit.txt has '2'. This '2' was entered by hand from visual comparison with the underlying 1st page of VN7 pdf.

Incidentally, the first item in each line of pwinfo is the L-number of the Schmidt entry. [If you look at pwinfo_edit.txt via the Github interfact, there are two numbers at the start of each line -- the first number is just the line number and is part of the Github UI, not part of the file data]

funderburkjim commented 2 years ago

Manual review

I carried out a review of all of page 1 of VN7 pdf with pwinfo. This occupies the first 184 lines of pwinfo_edit.txt.

The entries from schmidt shown in pwinfo listing are those with 'type=EMPTY-STRING'.

The most common change was the Arabic numeral; it often had to be changed from 1 to something else, based on the pdf page.

There are a relatively small number of variances:

Some Schmidt headwords are not found in VN7 pdf. Here I put a '?' the Arabic-number place. About 20 of these
A small number (6) of cases where VN7 headword not found in pwinfo; These are extra lines in pwinfo_edit file, with a '?' character in the first (L) field.
- Some of these actually do occur among the sch headwords, BUT WITH A DIFFERENT TYPE (e.g. '*').

A '?' also marks other less systematic variances, such as अकर्तर् I? 7 where the Roman-numeral 'I' is missing, although the headword is in PW (in volume 1).

funderburkjim commented 2 years ago

Should this be continued?

I think it likely that we could simulate the PWK VN material from Schmidt. I think it would be done in two passes.

Finish pwinfo_edit.txt (for all 100+ pages of VN7 Index
Using the VN volume number (Arabic numeral), compare the full text of Schmidt entries with those of PDF for each volume.

Then generate entries for PWK (L-numbers, metalines, text) and add to pw.txt digitization.

The question to answer now seems to be: should we proceed in this way? I hope the pwinfo_edit work described above helps us come to a decision.

funderburkjim commented 2 years ago

To compare the pwinfo_edit.txt version of first page of vol 7 VN, you could use pw7-289N.pdf

To edit other pages, you could use the pdfs at https://github.com/sanskrit-lexicon/PWK/issues/70#issuecomment-922510752.

gasyoun commented 2 years ago

should we proceed in this way?

Yes, because it remains a part of the biggest Sanskrit dictionary ever made. And to have headwords with errors is not so great. At least we should have data that is known, there are no such word at all.

funderburkjim commented 2 years ago

@gasyoun To clarify, by 'proceed in this way', this means 'PWK-VN derived from Schmidt'. The alternative is 'PWK-VN derived from newly typed VN' .

We are decided on the need for 'PWK-VN', for the reason you suggest'. The question we are debating is which of the two ways to get there.

gasyoun commented 2 years ago

'proceed in this way', this means 'PWK-VN derived from Schmidt'.

Understood. So as I see you try both ways at once, as @thomasincambodia has got it typed.

funderburkjim commented 2 years ago

Right, my next step is to make use of the digitized sample from @thomasincambodia (#76)

Andhrabharati commented 2 years ago

My feeling is that the digitised text would be the preferred way. [And this will be a faster approach and closer to the printed text, than the "simulated" text.]

Andhrabharati commented 2 years ago

My feeling is that the digitised text would be the preferred way. [And this will be a faster approach and closer to the printed text, than the "simulated" text.]

I had just finished mapping the pwk VN entries to SCH entries, and in the process happened to see inside the SCH text.

Now, I strongly recommend making the pwk VN digitization from the printed matter, instead of trying to derive the same from SCH text. Wish @thomasincambodia would put his "team" on the task sometime soon. [Would be posting the SCH study details, in due course.]

maltenth commented 2 years ago

@Andhrabharati

typing of VN has already been going on for some time (thanks to generous financing from @funderburkjim) and should be finished shortly.

Andhrabharati commented 2 years ago

This is a great news and look forward for its data to be ready soon.

And this might prompt closing all the issues relating to pwk VN material derivation from the SCH text.

gasyoun commented 2 years ago

has already been going on for some time

May we know some details, please? A year or two? @thomasincambodia have you even thought of adding a donation buttion on the homepage or it is not possible of a University subpages? I've met peple who have said that would be willing to donate, but there is no such option now. A coder or even two would be badly needed, we have thousands of unsolved issues piling up for years. Would love to know your opinion on that, thanks!

maltenth commented 2 years ago

@gasyoun

Expecting the work on digitizing the VN of PWK to be finished in another 10-14 days.

Never considered adding a donation button. I suppose it couldn't be done anyway on a university website for legal reasons.

gasyoun commented 2 years ago

VN of PWK to be finished in another 10-14 days.

Thanks for the update.

Never considered adding a donation button.

But you are not agains it, right?

I suppose it couldn't be done anyway on a university website for legal reasons.

So I believe as well. But what if we locate outside form the unversity website? Giving a link to it from a homepage? Wonder whom could we ask how not to break nothing.

Andhrabharati commented 1 week ago

Now that all the VN pages of pwk volumes got digitized (courtesy Jim & Thomas), various issues on pwkvn and SCH could be closed now.

What do you say, @funderburkjim ?

sanskrit-lexicon / PWK

PWK VN7 Schmidt #75

Manual review

Should this be continued?