sanskrit-lexicon / PWG

Boehtlingk und Roth Sanskrit Wörterbuch, 7 Bände Petersburg 1855-1875
0 stars 0 forks source link

PWG scan page errors in vol 6 #40

Closed funderburkjim closed 2 years ago

funderburkjim commented 2 years ago

While improving the RV markup in PWG (#38), I noticed some problems with the scanned image links in Volume 6. @sanskritisampada has checked all the scan links (from 6-0301 through 6-0499). The task was to repeatedly use the servepdf url, starting with https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=PWG&page=6-0301 https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=PWG&page=6-0302 and continuing through https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=PWG&page=6-0499

and to note where the internal page numbers differ from the page number of the url (There are 2 pages in each scan image for PWG). Here are the results

PWG scan match.
Checked pages 6-0301 to 6-0499

0301 to 0328 scans are matching.
0329 --> 313-314
0331 --> 315-316
0333 --> 317-318
0335 --> 319-320
0337 --> 321-322
0339 --> 323-324
0341 --> 325-326
0343 --> 327-328
0345 --> 329-330
0347 --> 331-332
0349 --> 333-334
0351 --> 335-336
0353 --> 337-338
0355 --> 339-340
0357 --> 341-342
0359 --> 343-344
0361 --> 345-346
0363 --> 347-348
0365 --> 349-350
0367 --> 351-352
0369 --> 353-354
0371 --> 355-356
0373 --> 357-358
0375 --> 359-360
0377 --> 361-362
0379 --> 363-364
0381 --> 365-366
0383 --> 367-368
0385 --> 369-370
0387 --> 371-372
0389 --> 373-374
0391 --> 375-376
0393 --> 377-378
0395 --> 379-380
0397 --> 381-382
0399 --> 383-384
0401 --> 385-386
0403 --> 387-388
0405 --> 389-390
0407 --> 391-392
0409 --> 409-410
0410 to 0499 All matching scans.

Now I need to get straight what is just a labeling error (problem with pdffiles.txt for PWG) and what scanned images (if any) are missing.

funderburkjim commented 2 years ago

rename several files

From the above, we see that 6-0345 is really 6-0329 . and similarly through 6-0391. The image pdf files are in pdfpages directory of 2013 of PWG at Cologne. (PWGScan/2013/web/pdfpages) Make a copy of the volume 6 images and put in temporary sibling directory temp_pdfpages_6:

mkdir temp_pdfpages_6
cp pdfpages/pwg6_*.pdf temp_pdfpages_6/

Now in effect rename the files by running this shell script:


cp temp_pdfpages_6/pwg6-0345.pdf pdfpages/pwg6-0329.pdf
cp temp_pdfpages_6/pwg6-0347.pdf pdfpages/pwg6-0331.pdf
cp temp_pdfpages_6/pwg6-0349.pdf pdfpages/pwg6-0333.pdf
cp temp_pdfpages_6/pwg6-0351.pdf pdfpages/pwg6-0335.pdf
cp temp_pdfpages_6/pwg6-0353.pdf pdfpages/pwg6-0337.pdf
cp temp_pdfpages_6/pwg6-0355.pdf pdfpages/pwg6-0339.pdf
cp temp_pdfpages_6/pwg6-0357.pdf pdfpages/pwg6-0341.pdf
cp temp_pdfpages_6/pwg6-0359.pdf pdfpages/pwg6-0343.pdf

cp temp_pdfpages_6/pwg6-0361.pdf pdfpages/pwg6-0345.pdf
cp temp_pdfpages_6/pwg6-0363.pdf pdfpages/pwg6-0347.pdf
cp temp_pdfpages_6/pwg6-0365.pdf pdfpages/pwg6-0349.pdf
cp temp_pdfpages_6/pwg6-0367.pdf pdfpages/pwg6-0351.pdf
cp temp_pdfpages_6/pwg6-0369.pdf pdfpages/pwg6-0353.pdf
cp temp_pdfpages_6/pwg6-0371.pdf pdfpages/pwg6-0355.pdf
cp temp_pdfpages_6/pwg6-0373.pdf pdfpages/pwg6-0357.pdf
cp temp_pdfpages_6/pwg6-0375.pdf pdfpages/pwg6-0359.pdf

cp temp_pdfpages_6/pwg6-0377.pdf pdfpages/pwg6-0361.pdf
cp temp_pdfpages_6/pwg6-0379.pdf pdfpages/pwg6-0363.pdf
cp temp_pdfpages_6/pwg6-0381.pdf pdfpages/pwg6-0365.pdf
cp temp_pdfpages_6/pwg6-0383.pdf pdfpages/pwg6-0367.pdf
cp temp_pdfpages_6/pwg6-0385.pdf pdfpages/pwg6-0369.pdf
cp temp_pdfpages_6/pwg6-0387.pdf pdfpages/pwg6-0371.pdf
cp temp_pdfpages_6/pwg6-0389.pdf pdfpages/pwg6-0373.pdf
cp temp_pdfpages_6/pwg6-0391.pdf pdfpages/pwg6-0375.pdf

cp temp_pdfpages_6/pwg6-0393.pdf pdfpages/pwg6-0377.pdf
cp temp_pdfpages_6/pwg6-0395.pdf pdfpages/pwg6-0379.pdf
cp temp_pdfpages_6/pwg6-0397.pdf pdfpages/pwg6-0381.pdf
cp temp_pdfpages_6/pwg6-0399.pdf pdfpages/pwg6-0383.pdf
cp temp_pdfpages_6/pwg6-0401.pdf pdfpages/pwg6-0385.pdf
cp temp_pdfpages_6/pwg6-0403.pdf pdfpages/pwg6-0387.pdf
cp temp_pdfpages_6/pwg6-0405.pdf pdfpages/pwg6-0389.pdf
cp temp_pdfpages_6/pwg6-0407.pdf pdfpages/pwg6-0391.pdf

Now (after deleting browser history) we see that 6-0329 through 6-0391 are proper.

We are left with erroneous pwg6-0393.pdf through pwg6-0407.pdf. Next step is to get these images as pdfs.

funderburkjim commented 2 years ago

something more puzzling

If we look at the now revised scanned page 6-0327 we see entry for rAjastamba at the first entry of p. 327 and rAji as the last entry of p. 329.

Now look at 6-0329. Here the first entry on p.329 shows as rAmacandracampU

But, according to pwg.txt the next entry after rAji is rAjika!

Wow! very confusing.

Will look for a volume 6 from archive.org.

funderburkjim commented 2 years ago

https://archive.org/details/in.ernet.dli.2015.7348/page/n169/mode/2up

This version seems to align with our version.

BUT IT HAS internal page number errors in the printing.

I'm going to undo what the script above did.

Maybe if we look at the page contents, rather than the internal page numbering (which appears erroneous), things will make sense.

funderburkjim commented 2 years ago

Ran script to restore original image file names:


cp temp_pdfpages_6/pwg6-0329.pdf pdfpages/pwg6-0329.pdf
cp temp_pdfpages_6/pwg6-0331.pdf pdfpages/pwg6-0331.pdf
cp temp_pdfpages_6/pwg6-0333.pdf pdfpages/pwg6-0333.pdf
cp temp_pdfpages_6/pwg6-0335.pdf pdfpages/pwg6-0335.pdf
cp temp_pdfpages_6/pwg6-0337.pdf pdfpages/pwg6-0337.pdf
cp temp_pdfpages_6/pwg6-0339.pdf pdfpages/pwg6-0339.pdf
cp temp_pdfpages_6/pwg6-0341.pdf pdfpages/pwg6-0341.pdf
cp temp_pdfpages_6/pwg6-0343.pdf pdfpages/pwg6-0343.pdf

cp temp_pdfpages_6/pwg6-0345.pdf pdfpages/pwg6-0345.pdf
cp temp_pdfpages_6/pwg6-0347.pdf pdfpages/pwg6-0347.pdf
cp temp_pdfpages_6/pwg6-0349.pdf pdfpages/pwg6-0349.pdf
cp temp_pdfpages_6/pwg6-0351.pdf pdfpages/pwg6-0351.pdf
cp temp_pdfpages_6/pwg6-0353.pdf pdfpages/pwg6-0353.pdf
cp temp_pdfpages_6/pwg6-0355.pdf pdfpages/pwg6-0355.pdf
cp temp_pdfpages_6/pwg6-0357.pdf pdfpages/pwg6-0357.pdf
cp temp_pdfpages_6/pwg6-0359.pdf pdfpages/pwg6-0359.pdf

cp temp_pdfpages_6/pwg6-0361.pdf pdfpages/pwg6-0361.pdf
cp temp_pdfpages_6/pwg6-0363.pdf pdfpages/pwg6-0363.pdf
cp temp_pdfpages_6/pwg6-0365.pdf pdfpages/pwg6-0365.pdf
cp temp_pdfpages_6/pwg6-0367.pdf pdfpages/pwg6-0367.pdf
cp temp_pdfpages_6/pwg6-0369.pdf pdfpages/pwg6-0369.pdf
cp temp_pdfpages_6/pwg6-0371.pdf pdfpages/pwg6-0371.pdf
cp temp_pdfpages_6/pwg6-0373.pdf pdfpages/pwg6-0373.pdf
cp temp_pdfpages_6/pwg6-0375.pdf pdfpages/pwg6-0375.pdf

cp temp_pdfpages_6/pwg6-0377.pdf pdfpages/pwg6-0377.pdf
cp temp_pdfpages_6/pwg6-0379.pdf pdfpages/pwg6-0379.pdf
cp temp_pdfpages_6/pwg6-0381.pdf pdfpages/pwg6-0381.pdf
cp temp_pdfpages_6/pwg6-0383.pdf pdfpages/pwg6-0383.pdf
cp temp_pdfpages_6/pwg6-0385.pdf pdfpages/pwg6-0385.pdf
cp temp_pdfpages_6/pwg6-0387.pdf pdfpages/pwg6-0387.pdf
cp temp_pdfpages_6/pwg6-0389.pdf pdfpages/pwg6-0389.pdf
cp temp_pdfpages_6/pwg6-0391.pdf pdfpages/pwg6-0391.pdf
funderburkjim commented 2 years ago

Now the page numbering is still wrong in the above range. This is a problem inherent in the printed edition. If

So this whole exercise ended up changing nothing. We'll call it a learning experience :).

funderburkjim commented 2 years ago

Text annotation to 'correct' page numbering?

As an experiment, I used an old copy of Adobe Acrobat to add a text field with the corrected page number to the pdf for 6-0329: https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=pwg&page=6-0329

Do others think this is a good idea, if so can add similar to rest of pages. It's somewhat crude, but might protect some future users from the same confusion I had.

What do you think?

Andhrabharati commented 2 years ago

@funderburkjim,

How about using this?- PWG Vol.6 Sp. 301-410.pdf

Andhrabharati commented 2 years ago

And you may also look at this- https://github.com/sanskrit-lexicon/PWG/issues/16#issuecomment-846426579

funderburkjim commented 2 years ago

At first glance, your vol. 6 pdf looks quite good -- must be a different printing that corrected the page numbering problem.

Will examine further.

Andhrabharati commented 2 years ago

This is from the MLBD reprint of the Japanese edition, which you also happened to see at the archive (the combined book of vols. 1-7).

It is better throughout; the Koeln scans are bad at quite a few places.

funderburkjim commented 2 years ago

Have examined all the page 6-0329 through 6-0407.
Compared the MLBD print (per pdf above) to the current Cologne scans. For each two-column 'page', looked at first line of column 1 and last line of column 2. In all cases, they appeared identical. But the MLBD page numbering is correct.

Noticed generally better page alignment in MLBD.
General print quality appears similar,

Will unpack AB's pdf and insert into appropriate spot on Cologne server.

funderburkjim commented 2 years ago

The new pages are now on Cologne server.

This completes solution of main problem of this issue. Thanks to @Andhrabharati for providing the new images.

funderburkjim commented 2 years ago

Correction needed at sanskrit-lexicon-scans/pwg repository

The sanskrit-lexicon-scans Github 'organization' has repositories for the scans of all the dictionaries.

Although our software does not currently use these images on Github, we should try to keep this source of images in sync with the 'official' source of images that are on the Cologne server.

This image shows that the pwg images at Github need also to be corrected:

image

funderburkjim commented 2 years ago

how to do

First, clone the https://github.com/sanskrit-lexicon-scans/pwg to local machine

git clone https://github.com/sanskrit-lexicon-scans/pwg.git
Cloning into 'pwg'...
remote: Enumerating objects: 4745, done.
remote: Total 4745 (delta 0), reused 0 (delta 0), pack-reused 4745
Receiving objects: 100% (4745/4745), 1.40 GiB | 5.02 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Updating files: 100% (4740/4740), done.

Second, copy the 'rename' pages into the pdfpages folder of cloned local sanskrit-lexicon-scans/pwg

Third, git add and git push

funderburkjim commented 2 years ago

check

After removing browser history, same link as above now has the revised page

image