Open funderburkjim opened 2 years ago
I carried out a review of all of page 1 of VN7 pdf with pwinfo. This occupies the first 184 lines of pwinfo_edit.txt.
The entries from schmidt shown in pwinfo listing are those with 'type=EMPTY-STRING'.
The most common change was the Arabic numeral; it often had to be changed from 1 to something else, based on the pdf page.
There are a relatively small number of variances:
A '?' also marks other less systematic variances, such as अकर्तर् I? 7
where the Roman-numeral 'I' is missing, although the headword is in PW (in volume 1).
I think it likely that we could simulate the PWK VN material from Schmidt. I think it would be done in two passes.
Then generate entries for PWK (L-numbers, metalines, text) and add to pw.txt digitization.
The question to answer now seems to be: should we proceed in this way? I hope the pwinfo_edit work described above helps us come to a decision.
To compare the pwinfo_edit.txt version of first page of vol 7 VN, you could use pw7-289N.pdf
To edit other pages, you could use the pdfs at https://github.com/sanskrit-lexicon/PWK/issues/70#issuecomment-922510752.
should we proceed in this way?
Yes, because it remains a part of the biggest Sanskrit dictionary ever made. And to have headwords with errors is not so great. At least we should have data that is known, there are no such word at all.
@gasyoun To clarify, by 'proceed in this way', this means 'PWK-VN derived from Schmidt'. The alternative is 'PWK-VN derived from newly typed VN' .
We are decided on the need for 'PWK-VN', for the reason you suggest'. The question we are debating is which of the two ways to get there.
'proceed in this way', this means 'PWK-VN derived from Schmidt'.
Understood. So as I see you try both ways at once, as @thomasincambodia has got it typed.
Right, my next step is to make use of the digitized sample from @thomasincambodia (#76)
My feeling is that the digitised text would be the preferred way. [And this will be a faster approach and closer to the printed text, than the "simulated" text.]
My feeling is that the digitised text would be the preferred way. [And this will be a faster approach and closer to the printed text, than the "simulated" text.]
I had just finished mapping the pwk VN entries to SCH entries, and in the process happened to see inside the SCH text.
Now, I strongly recommend making the pwk VN digitization from the printed matter, instead of trying to derive the same from SCH text. Wish @thomasincambodia would put his "team" on the task sometime soon. [Would be posting the SCH study details, in due course.]
@Andhrabharati
typing of VN has already been going on for some time (thanks to generous financing from @funderburkjim) and should be finished shortly.
This is a great news and look forward for its data to be ready soon.
And this might prompt closing all the issues relating to pwk VN material derivation from the SCH text.
has already been going on for some time
May we know some details, please? A year or two? @thomasincambodia have you even thought of adding a donation buttion on the homepage or it is not possible of a University subpages? I've met peple who have said that would be willing to donate, but there is no such option now. A coder or even two would be badly needed, we have thousands of unsolved issues piling up for years. Would love to know your opinion on that, thanks!
@gasyoun
Expecting the work on digitizing the VN of PWK to be finished in another 10-14 days.
Never considered adding a donation button. I suppose it couldn't be done anyway on a university website for legal reasons.
VN of PWK to be finished in another 10-14 days.
Thanks for the update.
Never considered adding a donation button.
But you are not agains it, right?
I suppose it couldn't be done anyway on a university website for legal reasons.
So I believe as well. But what if we locate outside form the unversity website? Giving a link to it from a homepage? Wonder whom could we ask how not to break nothing.
Now that all the VN pages of pwk volumes got digitized (courtesy Jim & Thomas), various issues on pwkvn and SCH could be closed now.
What do you say, @funderburkjim ?
pwinfo_edit.txt is a to generate partial simulation the Volume7 index of PWK.
pwinfo.txt is a programmatically generated precursor, based only on schmidt. It is partial in two aspects:
The Roman numeral can be derived programmatically by looking up the Schmidt headword in the PW index of headwords. If the PW headword is found, then the metaline for that headword provides the Volume of the headword, and is printed just after the headword in pwinfo listing. If the SCH headword is not found in PW, then an underscore '_' is printed in pwinfo.
However, the Arabic numeral cannot definitely be found. The number appearing in pwinfo.txt is an estimation of the VN group, based on the first letter of the headword. (This is reasonable, since PWK 1-7 headwords are alphabetic. Volume 1 has a-O (slp1 spelling), Volume 2 k-Q, etc.). However, VN2 contains some additions/corrections for headwords beginning with 'a' (for instance आंशुमद्भेदसंग्रह at line 12 of pwinfo). Note that line 12 of pwinfo.txt has '1' estimate; but the same line of pwinfo_edit.txt has '2'. This '2' was entered by hand from visual comparison with the underlying 1st page of VN7 pdf.
Incidentally, the first item in each line of pwinfo is the L-number of the Schmidt entry. [If you look at pwinfo_edit.txt via the Github interfact, there are two numbers at the start of each line -- the first number is just the line number and is part of the Github UI, not part of the file data]