sanskrit-lexicon / csl-corrections

Replacement for sanskrit-lexicon/CORRECTIONS. User corrections to sanskrit-lexicon/csl-orig
GNU General Public License v3.0
0 stars 0 forks source link

bor: bad scan pages #54

Closed funderburkjim closed 3 months ago

funderburkjim commented 3 years ago

Several bad scan pages were encountered during the english word corrections discussed at

53; such as Page 102.

Also noticed were pages 100, 104, 108, 156.

We should quickly go through all the BOR scan pages to get a complete inventory of such pages where part of the image is missing.

It might be possible to find replacement images at archive.org.

funderburkjim commented 3 years ago

There are about 770 scan pages in the Borooah dictionary

Someone could go to Page 1 and then use the right arrow button repeatedly to go through all the pages.

Each page should take only a moment to identify as acceptable or not -- we're just looking for cases where part of the page is missing from the scan, like in those pages mentioned above.

Just jot down the page number (from the url address bar 'page=...')

Then post the list of page numbers with bad scans as a comment here.

With a good broadband connection, this likely would take about 2-3 hours.

Any volunteers?

funderburkjim commented 3 years ago

image

Note:

gasyoun commented 3 years ago

Any volunteers?

Me, if you're into https://github.com/funderburkjim/MWderivations/issues/14 :)

funderburkjim commented 3 years ago

I'm assuming that offer is a joke!

Will see if Sampada wants to do it.

Probably need to do similar task for GRA (I recall there are some scan pages that are missing some data).

sanskritisampada commented 3 years ago

yes...I will do it!

gasyoun commented 3 years ago

I'm assuming that offer is a joke!

It's 3 days as I'm waiting for your answer and silence is all I hear.

yes...I will do it!

You're a miracle.

sanskritisampada commented 3 years ago

Jim asked me a few minutes ago. Give me a day or two?

sanskritisampada commented 3 years ago

It was quicker than I expected. Below is the list of pages in which content is missing due to the poor scans. Some are very poor, and some have barely a word or two missing. 002 004 022 024 044 048 050 054 058 066 092 094 096 098 100 102 104 106 108 110 112 118 120 134 156 164 166 198 212

funderburkjim commented 3 years ago

@sanskritisampada wow! That was fast.

Thank you.

funderburkjim commented 3 years ago

Here is a link to an archive.org version of Borooah:https://archive.org/details/englishsanskritdictionaryanundoramborooah1971_956_C/page/n125/mode/2up

But a comparison of a few of the pages in range 100+ makes me think that it is identical to our scans; maybe this particular archive.org source actually is what was used by Thomas' group in the original digitization.

funderburkjim commented 3 years ago

Here's another version from archive.org: https://archive.org/details/englishsanskritdictionaryanundoramborooahpublicationboardassam1971_285_c/page/n140/mode/1up

Comparing this to page 108 of Cologne scan:

  1. The archive org version is better aligned: all of page is visible.
  2. But the archive org is worse in terms of clarity of most Devanagari .

For reference, here is another similar version: https://archive.org/details/in.ernet.dli.2015.42081/page/n138/mode/1up

There is also a 2-volume version that could be referred to, but the pagination is different from the others. This is a link to volume 1: https://archive.org/details/practicalsanskritenglishdictionaryanundoramborooahvol1_24_o/page/n245/mode/1up

and volume 2: https://archive.org/details/practicalsanskritenglishdictionaryanundoramborooahvol2_818_M/page/n249/mode/2up

funderburkjim commented 3 years ago

I was hoping to find a 'perfect in all regards' replacement for the BOR scans at Cologne, but my quick search at archive.org did not yield this.

It might be worth doing comparative analysis of the Cologne digitizations at the places among the pages Sampada found above where the scan appears faulty.

Andhrabharati commented 3 years ago

I may mention here that BOR was also printed as 3 fascicules originally, like 22 parts of Vacaspatyam; and each part of it has a big discussion about one topic in the beginning. A nice idea it was.

if interested, I can give the whereabouts of the 4 parts.

Andhrabharati commented 3 years ago

I am also having the fully recomposed print by Assam Government, in memorium of Anundoram Borooah twice!

Probably I could help by giving the 29 pages (reported by Sampada) snapped from my copy (1971 print) that is at the archive link Jim gave above.

gasyoun commented 3 years ago

if interested, I can give the whereabouts of the 4 parts.

Please.

Probably I could help by giving the 29 pages (reported by Sampada) snapped from my copy (1971 print) that is at the archive link Jim gave above.

Here is @Andhrabharati scan https://vk.com/samskrtamru?w=wall-88831040_11549 English Sanskrit Dictionary Anundoram Borooah 1971.pdf

Andhrabharati commented 3 years ago

This is not my scan; just one copy in my collection.

As I mentioned above, I can snap the required pages from my physical copy.

funderburkjim commented 3 years ago

I can snap the required pages from my physical copy.

Let's try this for page 100.

Andhrabharati commented 3 years ago

Here it is- BOR(1971)_p 100

Andhrabharati commented 3 years ago

And here are the full pages listed above (snapped & processed in ScanTailor as B&W images).

BOR pages snapped.zip

Want the greyscale images?

Andhrabharati commented 3 years ago

Probably need to do similar task for GRA (I recall there are some scan pages that are missing some data).

If the page no.s are given, I can help with Grassmann as well.

gasyoun commented 3 years ago

ScanTailor

Magic is the tool.

Andhrabharati commented 3 months ago

@funderburkjim

You seem not to have seen this post.

These CDSL pages are still bad, as reported by Sampada.

funderburkjim commented 3 months ago

BOR.pages.snapped.zip

The files in the zip are tif files, named like 'BOR_002.tif'

For compatibility with cdsl display links, the images need to be pdf files named like 'pg_002.pdf'.

@Andhrabharati Request you

Then I'll replace the bad pdfs at cdsl.

Andhrabharati commented 3 months ago

Here it is, @funderburkjim -- BOR.pages.snapped.pdf.zip

funderburkjim commented 3 months ago

The new pdfs have now been installed at Cologne.

Also revised https://github.com/sanskrit-lexicon-scans/bor.

@Andhrabharati Thanks for noticing that this needed doing, and for providing the new pdfs.

Closing this issue.