hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
16 stars 4 forks source link

Set up proxy server for use in <img> tags #421

Closed acka47 closed 4 years ago

acka47 commented 4 years ago

The hbz received a request on 2020-06-21:

http://lobid.org/gnd/118519522

dort eingebunden https://upload.wikimedia.org/wikipedia/commons/e/eb/ErnstCassirer.jpg

https://github.com/hbz/lobid/blob/master/conf/Datenschutzerklaerung_lobid.textile

Diese Datei enthält die Zeichenkette "wiki" oder "Wiki" nicht, auch keinen Hinweis auf USA, wo wikimedia.org verwaltet wird.

We also embed other images in <img> tags, e.g. for icons for sameAs links. I guess, the best option would be to load all images via a proxy server, like https://lobid.org/images?url=https://upload.wikimedia.org/wikipedia/commons/e/eb/ErnstCassirer.jpg. See e.g. https://www.kritzelblog.de/techniken-fuer-webentwicklung/dsgvo-und-affiliate-apis-bilder-einbinden/#Loesung_3_Bilder_von_der_eigenen_Domain_einbinden

As soon as this is set up, we need to adjust the lobid-gnd and lobid-organisations code. (This would then also fix https://github.com/hbz/lobid-gnd/issues/195.)

dr0i commented 4 years ago

That's easy re common-links, e.g. with an apache proxy. Try: http://commons.lobid.org/wiki/Special:FilePath/ErnstCassirer.jpg?width=270 . Now, if we would have certificate for it ... we could even deliver https.

dr0i commented 4 years ago

Your more generic proposal works also within a apache - and we can make use of SSL certs:

RewriteCond %{QUERY_STRING} ^images=(.) RewriteRule commons.$ %1 [NC,L]

Test e.g. https://test.lobid.org/commons?images=https://commons.wikimedia.org/wiki/Special:FilePath/ErnstCassirer.jpg?width=270

acka47 commented 4 years ago

As discussed offline, this isn't doing the job because it's just a redirect where the client is making a request to the remote server anyway. What we need:

acka47 commented 4 years ago

Note that we should write a short blogpost about our solution when finished as currently there doesn't seem to be anything sensible out there.

acka47 commented 4 years ago

I just noticed that we already use https://lobid.org/images/ for serving images from our server but I think this is ok and probably even makes sense to also use it for the proxy server, doesn't it?

dr0i commented 4 years ago

Re URL-Design: it's tricky. As the vhost of "lobid.org" is already crowded and sideeffects would be likely I set up another subdomain. However, the most important point should be that this thing does its work. Most things are working, the fetching of the loc favicon not yet. This is the status quo so far, try e.g.:

https://test.lobid.org/gnd/118519522

I also used the tileserver proxy (which was set up as an apache proxy, but not used yet (note: should also be used in lobid-organisations)):

https://test.lobid.org/gnd/4296439-8

One cannot be 100% happy with these proxies because they are not 100% generic, i.e. when there are certain redirects done on the to be be proxied URL than there must be done some more manually configuration at lobid-proxy side (re: loc favicon). For the latter we might also want to store them locally in the images directory. This may make also good sense when you try to see what is all done when tis favicon is requested (two 301 redirects, check with curl -Lvvv http://www.loc.gov/favicon.ico -I).

dr0i commented 4 years ago

Back to start - dumping the idead of achieve it by just doing apache conf magik. So, using the php script kritzelblog provides as you mentioned. Adapted a bit to:

<?php
$url = $_GET['url'];

if(strpos($url,".ico") || strpos($url,".png")) {
    header("Content-Type: image/png");
} else {
        if(strpos($url,".jpg")) {
        header("Content-Type: image/jpg");
        } else  {
                if(strpos($url,".gif")) {
                        header("Content-Type: image/gif");
                }
         }
}
readfile(str_replace(" ", "%20", $url));
?>

Note the urlencoding by string replacement at the end (may be need to adapt that if other strange URLs occur). See a test with a problematic jpg (percent escaped spaces (%20) in it): https://test.lobid.org/gnd/118979302

(What strikes me dangerous, though, as mentioned before, is the potential misuse of that proxy, since everyone is allowed to use it to her will).

Note also that this script doesn't work properly with php5. As I cannot update the suse server nicely I just reverseProxied to our new production server in spe (atm aka "testemphytos") where php7 is installed.

dr0i commented 4 years ago

Note that the tileserver proxy comes now also with SSL (defined a Location in lobid-vhost instead of using a new subdomain. Location allows to set headers "locally", here unsetting HSTS was needed).

Review also lobid-organisations: start-page7 DE-605

acka47 commented 4 years ago

Nice. Everything looks fine to me. +1

dr0i commented 4 years ago

Deployed in production. @acka47 please inform the informant via mail that this is fixed?

dr0i commented 4 years ago

I also used the tileserver proxy

Not a good idea: Requests are blocked with too many requests.

dr0i commented 4 years ago

Reverted the use of the tile server with #437. Closing.

Work to be done in https://github.com/hbz/lobid/issues/309.