sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

AP90 printed pages not landing in SIMPLE #398

Closed drdhaval2785 closed 1 year ago

drdhaval2785 commented 1 year ago

A user raised the following problem.

https://sanskrit-lexicon.uni-koeln.de/simple/

Search for any headword in AP90. Click on the page link. It lands on the index page instead of the PDF / JPEG.

I could reproduce the problem.

Andhrabharati commented 1 year ago

Same issue noticed with SKD as well. [Adv. search]

funderburkjim commented 1 year ago

This was a very tricky problem. It turns out that the webmaster made a 'redirect' on 12-4-2022 that wreaks havoc on the location of the pdfpages for most of the dictionaries.
csl-apidev/servepdf.php assumes the images urls for AP90, SKD, etc. are in a certain location: e.g.

  // this line from csl-apidev/dictinfo.php
  "AP90"=>"//www.sanskrit-lexicon.uni-koeln.de/scans/AP90Scan/2014/web/pdfpages" ,

However, such OLD urls are now redirected to 2020, by this 'httpd_rhel7.conf' code introducted by webmaster:

RedirectMatch permanent 

^/scans/((?:MW72|PW|BHS|PWG|SCH|CCS|PD|BUR|AP90|ACC|MD|WIL|PUI|SKD|SHS|INM|STC|MWE|VCP|PGN|KRM|GRA|BOP|YAT|VEI|IEG|BEN|AP)[Ss]can)/201[34]/web/?(?!pdfpages).* https://www.sanskrit-lexicon.uni-koeln.de/scans/$1/2020/web/
## Genervt von Tickets wie xxx et al. und davon, dass sich
## die Betreiber nicht selbst darum kümmern, lege ich halt hier mal (etwas
## grob) Hand an, um alte URLs zu redirecten.
## WH, 2022-12-04

So when servepdf generates an image url within a 2013 or 2014 version, the redirect changes the location to 2020!

funderburkjim commented 1 year ago

The reason that a few dictionaries (e.g. LRV, MW, LAN and a few others) are not affected by the rewrite is that their images are in a location not matching the rewrite pattern.

For instance, LRV is in a 2022 directory, LAN is in 2020 directory, MW is in a directory sibling of the 20xx directories.

This observation provides a solution idea: Put the images for dictionary xxx in a directory that does not match the rewrite!

There are many possible alternate image locations that could be used.

Since the images are, for most purposes at least, independent of the dictionary version (i.e., the images don't depend on 2014, 2020, etc.), the pattern used for MW seems good. that location is MWScan/MWScanpdf.

So, I'll work towards changing dictinfo accordingly, and moving the various pdfpages directories to the new location consistent with the new dictinfo location.

funderburkjim commented 1 year ago

All images should now be once again available.
Hope webmaster doesn't throw any more curves!

funderburkjim commented 1 year ago

Incidentally, the urls of prior versions of basic displays now get rewritten (by the rewrite rule above) to the 2020 version of the dictionary 'home page'. For instance

https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2014/web/webtc/indexcaller.php

is rewritten to

https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/web/

I guess the webmaster thought this rewrite would be useful in the ongoing skirmishes with xss vulnerabilities, as discussed in https://github.com/sanskrit-lexicon/csl-websanlexicon/issues/27.

AFAIK, I did not receive notification of this change by webmaster.

https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2014/web/webtc/indexcaller.php