sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

MW Scan review #144

Closed funderburkjim closed 1 year ago

funderburkjim commented 1 year ago

Although most scan pages for mw used on sanskrit-lexicon at Cologne are good,. recent work (#142) found a couple of scans that need to be replaced:

page 63, 1st column aBi-dUzita bad print
page 87 bottom

It seems desireable to review all the scan pages.

Method

There are various ways to approach such a review of all the scans. Here is one way. Use servepdf application.

  1. Start with page 1: https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=mw&page=1
  2. review the legibility of the page
    • make a note of the page if you think it should be replaced
  3. use the 'next page' link to retrieve the scan of next page
  4. go to step 2.
  5. Repeat until done. The last page is 1333.
  6. Post the list of questionable pages in a comment to this issue.

The objective is to identify pages which are partly unreadable. Unreadability can occur for many reasons.

With a bit of practice, the retrieval of the next page and the review of that page will probably take less than one minute. At one minute per page, the review of all 1333 pages would take 1333 minutes, or 22 hours.

funderburkjim commented 1 year ago

@AnnaRybakovaT Let me know if you plan to undertake this review.

AnnaRybakovaT commented 1 year ago

Let me know if you plan to undertake this review.

Yes, of course. your guidance is clear as always. From Monday I am ready to start this task.

gasyoun commented 1 year ago

Unreadability can occur for many reasons.

Geometry gone wrong as well. Skewed pages.

AnnaRybakovaT commented 1 year ago

page 87 bottom

Dear Jim, In the beginning of review I have some notes.

  1. Could you explain me why the page 87 has to be replaced? To be honest I don't see any serious problem.
  2. Almost all odd pages (at least until page 130) have one defect which I called "unprinted vertical line". If you pay attention you will notice how many symbols/letters were not printed well. On the one hand you can guess easily all English words without one letter, but on the other hand there are as well some Sanskrit words. Please let me know if this defect is significant or not. изображение

Just now I make notes of such odd pages by this way: p.11 - all 3rd column (unprinted vertical line) p.13 - all 3rd column (unprinted vertical line, illegible only some English words) p.15 - all 3rd column (unprinted vertical line) p.17 - all 3rd column (unprinted vertical line, illegible only some English words)

gasyoun commented 1 year ago

"unprinted vertical line"

@AnnaRybakovaT seems indeed an issue in the reprint.

Andhrabharati commented 1 year ago

You indeed are doing a good job, @AnnaRybakovaT -- I like you!

And, how do you guys-- @gasyoun and @funderburkjim -- rate this?

image

Andhrabharati commented 1 year ago

@funderburkjim

see these portions from the pages you had mentioned above--

(p. 63)

image

(p.87)

image

AnnaRybakovaT commented 1 year ago

2. Almost all odd pages (at least until page 130)

It looks like this defect have almost all odd pages until the page 183.

AnnaRybakovaT commented 1 year ago

With a bit of practice, the retrieval of the next page and the review of that page will probably take less than one minute

Just for information. Now my speed is 1 page per 1,5 min. For 2 hours (including some breaks for eyes) I manage to check 70 pages.

funderburkjim commented 1 year ago

@AnnaRybakovaT That 'unprinted vertical line' problem is a good catch - I had not noticed it. The degree of obscurity varies depending on the page, but is sometimes problematic.

BTW, I don't now see the 'problem at bottom of page 87' that I mentioned above.

Andhrabharati's page 7 fixes the vertical line odd-page problem for this page.
If this version is of similar quality for other pages, should we replace ALL current pages with this version? A drawback of this version is the 'yellow' (or beige, tan) background -- there is less contrast between text and background in the yellow version when compared with the contrast in the 'white background' version. Is there an image-processing way to turn the beige background to white?

@AnnaRybakovaT I say go ahead with your review (but don't go blind!). Others need to consider whether @Andhrabharati's (yellow) version should replace the white version en-masse. Or should he make these images available to Anna to review to see if the yellow version is uniformly better than the white version.

Andhrabharati commented 1 year ago

If this version is of similar quality for other pages, should we replace ALL current pages with this version?

Yes, all the pages are of such good quality in this scan.

I suggest to keep the yellow background, as it is meant for good readability; recall that the MBh scans that were used recently (from Bayerisch Stattlib) are also like these.

Of course, I can make them greyscaled if wanted.

Andhrabharati commented 1 year ago

If Anna is able to do only at this rate, I would like to say that my current scans can blindly be used to replace the present CDSL scans.

No point spending such longer times to check the ind. pages.

Andhrabharati commented 1 year ago

@funderburkjim

What do you think?

Andhrabharati commented 1 year ago

Or should he make these images available yo Anna to review to see if the yellow version is uniformly better than the white version.

@funderburkjim can take my words about the quality of these!!

funderburkjim commented 1 year ago

@Andhrabharati I'd like to get opinion of others (@gasyoun , @drdhaval2785 , @AnnaRybakovaT ) regarding the 'yellow' background. Could you do a couple of pages as (a) yellow and (b) grayscale and upload so we can compare.

Once this is done, I think it makes sense to decide to replace current scans with yours -- Then Anna can stop her review, which has already been helpful.

funderburkjim commented 1 year ago

@Andhrabharati when you upload samples, please upload as pdf pages.

Andhrabharati commented 1 year ago

@funderburkjim / @gasyoun / @AnnaRybakovaT / @drdhaval2785

In my opinion, the file size is NOT of that much concern as in the olden times. The clarity of the file is what matters more now.

Here are the first 35 pages (as samples) [to be within the 25MB limit of Github] of both colored and greyscale versions of the MW main text (from the scan I've got), for your perusal-- MW-sample (coloured).pdf MW-sample (greyscale).pdf

And here are the side-by-side comparison screens of the two files at different magnifications-- image

image

It is a matter of few minutes to split them into individual files for CDSL use, and whichever of the above two suits the team may be used at the CDSL project. [I do not refer the individual pages at all (and that too at the browser), so I have no preference for either of these. I always use local and complete files for my working.]

I have made full pages files also and uploaded both types to pcloud, to immediately share whichever is decided upon.

gasyoun commented 1 year ago

And, how do you guys-- @gasyoun and @funderburkjim -- rate this?

Perfect

Is there an image-processing way to turn the beige background to white?

Yes, there is.

Or should he make these images available to Anna to review to see if the yellow version is uniformly better than the white version.

I do not see a reason for Anna to continue. I'm for replacing. Yellow or white, the scan is much better. I do not mind for the yellow, because some of the books we use are yellow already, Mahabharata included.

In my opinion, the file size is NOT of that much concern as in the olden times. The clarity of the file is what matters more now.

Size does not matters. Clarity does and we miss it in most scans at Cologne.

@Andhrabharati I'm happy you are around. @AnnaRybakovaT the work is done, no more reading needed - the scan is bad indeed and even that single case was enough. @funderburkjim I do not mind having in yellow at all.

AnnaRybakovaT commented 1 year ago

the work is done, no more reading needed - the scan is bad indeed and even that single case was enough.

ok

funderburkjim commented 1 year ago

@Andhrabharati

Please provide links for both the 'white' and 'yellow' mws pdf pages - I'd like to have both Marcis indicates a preference for 'yellow' = 'coloured'. Do you have a preference?

Andhrabharati commented 1 year ago

Here are the links, @funderburkjim --

Colored: https://u.pcloud.link/publink/show?code=XZuwf1VZQ9Pkc8pPUsQrqEqziRsHo54UJmWy0

Greyscale: https://u.pcloud.link/publink/show?code=XZuwf1VZQ9Pkc8pPUsQrqEqziRsHo54UJmWy

Andhrabharati commented 1 year ago

Do you have a preference?

Here were my prev. indications--

https://github.com/sanskrit-lexicon/MWS/issues/144#issuecomment-1362577913 I do not refer the individual pages at all (and that too at the browser), so I have no preference for either of these. I always use local and complete files for my working.

https://github.com/sanskrit-lexicon/MWS/issues/144#issuecomment-1362308038 I suggest to keep the yellow background, as it is meant for good readability; recall that the MBh scans that were used recently (from Bayerisch Stattlib) are also like these.

funderburkjim commented 1 year ago

OK - that's two yellow background prefs. Will call yellow the winner.

funderburkjim commented 1 year ago

When I click you greyscale link, It also shows as Colored?

Andhrabharati commented 1 year ago

Looks like I had pasted the same link at both places.

Andhrabharati commented 1 year ago

https://u.pcloud.link/publink/show?code=XZfOLeVZ7HxdD2wNQEk9edDUHJBsqSQQ2yk0

This is the link for greyscale file.

Andhrabharati commented 1 year ago

got both the files, @funderburkjim ?

funderburkjim commented 1 year ago

Got both. Thanks! Working to install the colored pages.

Andhrabharati commented 1 year ago

Good; now I am deleting them to clear the pcloud space.

funderburkjim commented 1 year ago

Understood

funderburkjim commented 1 year ago

Yellow scans now installed at Cologne. Also installed at https://github.com/sanskrit-lexicon-scans/mw File names are 'funky' e.g. page 25 pdf is mw0025-ananRta.pdf.

To use simpler names would require changes in several places, and I decided not to bother with that now.

Some notes and work material are in issue144 directory.

Andhrabharati commented 1 year ago

@funderburkjim

Do you wish replacing the front pages as well, with the new set?

funderburkjim commented 1 year ago

@Andhrabharati Yes, might as well install the front pages also.

Andhrabharati commented 1 year ago

Here is the MW-front pages file, @funderburkjim -- MW_front (coloured).pdf

Andhrabharati commented 1 year ago

And here is a surprise gift, that people would not come across normally-- MW1899- front cover and title pages.pdf MW1899- back cover pages.pdf

[@gasyoun might like this!!]

gasyoun commented 1 year ago

And here is a surprise gift, that people would not come across normally--

Hope one day I will be able to replace it with the original one. The reprint itself is of poor quality.

Andhrabharati commented 1 year ago

The reprint itself is of poor quality.

Which reprint are you referring to, @gasyoun ?

funderburkjim commented 1 year ago

I have uploaded these pdfs to sanskrit-lexicon-scans/mw repository at Github.

The front pages are currently used in two spots from the sanskrit-lexicon homepage

  1. S1 pdf scanned edition of MW : url
  2. Documentation front matter : url.

My opinion is that the 'white' pages used currently for front matter are satisfactory. Thus, it is not required to replace the 'white' pages with the 'colored' pages for the front matter.

If others have strong opinion otherwise, please comment here and I can do the necessary extraction, renaming, etc. to use the 'colored' pages for the front matter in the two applications mentioned above.

Otherwise, this issue may be closed.

Andhrabharati commented 1 year ago

These front pages need not be replaced for quality, as they serve the purpose reasonably well. And that's why I did not include those pages in the first round (of replacement) itself.

Only reason I had in mind (in asking your opinion) is to have 'uniform' pages for the whole set.

Andhrabharati commented 1 year ago

This issue is already closed, technically!