servepdf - Githubissues

funderburkjim commented 4 years ago

servepdf.php is the module that generates display a scanned image. It appears as a link in the display of any dictionary entry.

This note documents refactoring of the servepdf module for the individual dictionary displays.

refactoring

Preliminarily, the original functional style code of servepdf.php was refactored into two parts

servepdf.php
- Defines a single function, servepdfCall, which constructs a ServepdfClass instance. and sends (Using PHP echo) the constructed html of the class to the caller.
- Calls servepdfCall(), thereby generating the html requested by a browser
servepdfClass.php Class whose constructor generates the html. Includes minor enhancements, described in following comments.

funderburkjim commented 4 years ago

server agnostic enhancement intro

The 'web' subdirectory for a dictionary contains code and resources for dictionary displays. It is envisioned that this code should work in two server environments:

the Cologne sanskrit-lexicon (named 'cologne')
other server environments, named 'xampp', which include many configurations of Apache server with PHP 7.

The display of scanned images provides special problems, primarily because the set of files generally requires a lot of disk space, so it is inconvenient to have multiple copies around.

Problem with old servepdf.php at Cologne

On the Cologne server, we are in process of designing a rearrangement of the dictionary infrastructure; some early discussion is here. In particular, we will likely have each dictionary in a '2020' subdirectory. For instance, for VCP dictionary, the current arrangement is:

VCPScan/2013/
- orig
- pywork
- web
- sqlite
- pdfpages (directory contains a file for each scanned image) (5,447 files , 518MB of space)
- webtc
- ... several other subdirectories of web

If we used the old version of servepdf, then for a rearrangement into a VCPSCan/2020/web directory would look like:

VCPScan/2020/
- web
- pdfpages which could be either
  - a complete second copy of the 5,447 files in VCPScan/2013/web/pdfpages
  - or a symlink to VCPScan/2013/web/pdfpages
  - OR, we could move VCPScan/2013/web/pdfpages to VCPScan/2020/web/pdfpages, in which case the 2013 displays would not be able to display scanned images.

None of these alternatives is pleasing.

funderburkjim commented 4 years ago

$dictinfowhich

The dictinfowhich.php module defines a PHP variable $dictinfowhich whose value provides a run-time switch used to distinguish the server running the application.

Currently, $dictinfowhich has one of two string values, "cologne" or "xampp" (see server agnostic enhancement intro comment above).

If the file system name of the current directory (provided by the PHP constant __DIR__) starts with 'afs', then the value is set to "cologne"; otherwise it is set to "xampp". The test could be made more precise if and when the need arises.

Dictinfo::get_pdfpages_url()

This method in dictinfo.php returns a string used (by ServepdfClass) to construct a URL string for a scanned image of a particular page in a particular dictionary.

get_pdfpages_url() uses the value of $dictinfowhich to tailor the computation depending on the server type.

For the cologne server type, get_pdfpages_url() returns a hard-coded URL, depending on the dictionary code. This is done by the helper method Dictinfo::get_cologne_pdfpages_url(). For examples:
- MW -> "//www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/MWScanpdf"
- AP90 -> "//www.sanskrit-lexicon.uni-koeln.de/scans/AP90Scan/2014/web/pdfpages"
For the xampp server type, get_pdfpages_url() returns a string representing a relative URL. Since this is called by web/webtc/serverpdf.php, the URL is relative to web/webtc on the server executing the browser request.
- Different relative paths are checked: ../pdfpages and ../../pdfpages. The first path which is a non-empty directory on the server is assumed to be the one to use, and is the value returned.
- If none of the relative paths corresponds to a non-empty directory on the server, then the get_pdfpages_url() returns the Cologne URL.
- This logic allows a non-Cologne installation to either (a) depend on Cologne server for scanned images (probably the usual case) or (b) use a local server copy of the scanned images.

flexibility of cologne solution

Currently, the cologne url is the pdfpages directory within the 'old' web directory, e.g. for VCP, the url is //www.sanskrit-lexicon.uni-koeln.de/scans/VCPScan/2013/web/pdfpages

When we construct a new version of the dictionary, in VCPScan/2020/, we can still use //www.sanskrit-lexicon.uni-koeln.de/scans/VCPScan/2013/web/pdfpages. i.e., the same url will work for the old VCPScan/2013 and the new VCPScan/2020.

Also, if we decide to change the location of the cologne images for any (or all) dictionaries, this can be accomplished by changing the get_cologne_pdfpages_url() function in the Dictinfo class in dictinfo.php.

funderburkjim commented 4 years ago

pdffiles.txt

In the web/webtc directory for each dictionary there is a file named pdffiles.txt.

This file is NOT maintained by csl-websanlexicon, but must be present in order for the display of scanned images if the scanned images are on the server.

This file is used by the getfiles method of the ServepdfClass instance. This method returns the filename corresponding to a page number for a particular scanned image.

Each dictionary digitization has

its own peculiar way of coding a page reference to a scanned image,
its own peculariar way of naming the scanned image file corresponding to a page reference.

The pdffiles.txt file lists the correspondences between page reference and file name. The getfiles method is responsible for matching a given page reference to one of those in pdffiles.txt, and thus determining the filename corresponding to the given page reference.

The pdffiles.txt lines are assumed to be in the order of the text. This permits the getfiles method to also return the filenames corresponding to the scanned images of the page preceding and following the given page reference. These two file names in turn permit the 'next' and 'previous' arrow controls that are part of the scanned image display generated by servepdf.

funderburkjim commented 4 years ago

server agnostic solution

The Dictinfo::get_pdfpages_url() method described above is the key to the current server agnostic solution provided by ServepdfClass in servepdfClass.php.

Here are the steps used to get a URL for the scanned image of a particular page of a particular dictionary. Assume that strings representing the page and dictionary are in variables $page and $dict in an ServepdfClass instance.

consturct a dictinfo instance: $dictinfo = new Dictinfo($dict);.
get a url for the pdfpages directory: $pdfpages_url = $dictinfo->get_cologne_pdfpages_url();.
Use the getfiles method to get the $filename corresponding to $page (see pdffiles.txt comment above): list($filename,$pageprev,$pagenext)=$this->getfiles($pdffiles_filename,$page,$dictupper);
Concatenate to get the URL for the page for this file: $pdf = "$pdfpages_url/$filename";

funderburkjim commented 4 years ago

local server copy of the scanned images

We have made provision for an xampp server installation to serve scanned image files for a dictionary. E.g., they could be put in folder 'web/pdfpages' on the server.

However, we do not currently have a seamless way to download just these images.

A solution of this is outside the scope of this csl-websanlexicon repository. I'll open an issue in the 'cologne' repository.

funderburkjim commented 4 years ago

Android pdf display

The code mentioned above is now part of ServepdfClass. I've confirmed that pdf images now display on Android OS using Chrome browser. Also, the images display properly on an (original) Ipad, with the Safari browser.

gasyoun commented 4 years ago

Rearrangement of the dictionary infrastructure - hope there is an end to it.

funderburkjim commented 4 years ago

hope there is an end to it.

Yes, Mr. Impatient. I hope so too!

Remember the big(ger) picture: We aim to make a 2020 version of the ongoing code and data that will be more manageable for Dhaval. I want the transition to be seamless, and the changes to servepdf are partly towards this end.

drdhaval2785 commented 4 years ago

And yes, the change in servepdf made the code serve on androids also, which is a major change. Go ahead Jim. Mr. Impatient can be treated as a nudger who keeps on nudging so that we don't sleep. :)

funderburkjim commented 4 years ago

@drdhaval2785 Have you independently checked Android regarding the scanned images?

drdhaval2785 commented 4 years ago

I tried in some. samsung s8+

WIL seems to have an issue.

drdhaval2785 commented 4 years ago

Screenshot_20190904-051243_Chrome

funderburkjim commented 4 years ago

The difference with WIL is that the image files are NOT pdfs, but jpgs. I thought I had checked this.

Which display are you using?

drdhaval2785 commented 4 years ago

Advanced display

funderburkjim commented 4 years ago

OK. Will look into this tomorrow.

I suspect there will be similar problem with PW, CCS, and MD which also have non-pdf image files.

funderburkjim commented 4 years ago

Cannot confirm problem with Wilson. Using url https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2014/web/webtc2/index.php

Also basic display works for wilson.

I'm using an Asus tablet with Android version 7.0. Also using Chrome browser -- not sure how to find its version.

The thing that looks odd in your picture above is 'pg_393.pdf' -- the pdf part is odd, since the image file name is pg_393.jpg!

funderburkjim commented 4 years ago

Try emptying browser cache -- want to be sure new code is being executed.

drdhaval2785 commented 4 years ago

Ohh. My error.

It is YAT which has error. WIL works fine on my android.

funderburkjim commented 4 years ago

YAT advanced search with 'Daval' (prefix) DOES display pg_393.pdf on the Asus tablet.

Do you have android 7 or a later version?

drdhaval2785 commented 4 years ago

Android 9

gasyoun commented 4 years ago

Mr. Impatient can be treated as a nudger who keeps on nudging so that we don't sleep.

Exactly.

Android 9

7 working, 9 - not. Strange. On my Android 8 Huawei in Chrome YAT advanced search shows same screen as Dhaval - nothing.

funderburkjim commented 4 years ago

Got a Galaxy Tab A (10.1). Verified it has Android 9. YAT image shows up!
Here's screenshot.

Screenshot_20190904-221754_Drive

So this Android 9 works for YAT image. Curiouser and curiouser !

Cannot think of why MW and other PDFs work for Dhaval's Android 9 S8, but YAT alone (so far) does not. Maybe should make a test page with links to servepdf for all 34 of the dictionaries, so we'll know if any dictionary besides YAT gives a problem on some devices.

gasyoun commented 4 years ago

Maybe should make a test page with links to servepdf for all 34 of the dictionaries, so we'll know if any dictionary besides YAT gives a problem on some devices.

Makes sense.

sanskrit-lexicon / csl-websanlexicon

servepdf #6

refactoring

server agnostic enhancement intro

Problem with old servepdf.php at Cologne

$dictinfowhich

Dictinfo::get_pdfpages_url()

flexibility of cologne solution

pdffiles.txt

server agnostic solution

local server copy of the scanned images

Android pdf display