Open jywarren opened 1 year ago
Hi @jywarren, I am happy to check this out.
Hi @jywarren, I checked the code. The transformation that takes place in the function (in archive.js) below is responsible for the behaviour you are talking about. If my memory serves me right, I think we designed it this way at the time because of issues related to accessing the images programmatically via IA. I also observed the wayback machine itself simply loads the images from s3. What do you think?
// where imageSrc is in format: https://web.archive.org/web/20220803171120/https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg
// returns https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg or
// returns same url unchanged (no transformation required)
function extractImageSource(imageSrc) {
if (imageSrc.startsWith('https://web.archive.org/web/')) {
return imageSrc.substring(imageSrc.lastIndexOf('https'), imageSrc.length);
}
return imageSrc;
}
Illustration 1:
Hmm, did this apply only to JSON maybe? Would you mind trying removing that so that it loads directly from the wayback machine?
Thanks for finding that!!!
On Sun, Mar 12, 2023, 2:48 PM Segun @.***> wrote:
Hi @jywarren https://github.com/jywarren, I checked the code. The transformation that takes place in the function (in archive.js) below is responsible for the behaviour you are talking about. If my memory serves me right, I think we designed it this way at the time because of issues related to accessing the images programmatically via IA. I also observed the wayback machine itself simply loads the images from s3. What do you think?
// where imageSrc is in format: https://web.archive.org/web/20220803171120/https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg // returns https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg or // returns same url unchanged (no transformation required) function extractImageSource(imageSrc) { if (imageSrc.startsWith('https://web.archive.org/web/')) { return imageSrc.substring(imageSrc.lastIndexOf('https'), imageSrc.length); } return imageSrc; }
Illustration 1: [image: img] https://user-images.githubusercontent.com/1612359/224565688-4ebdb4cc-6b7b-4ba1-919b-18e1fa965c06.PNG
— Reply to this email directly, view it on GitHub https://github.com/publiclab/Leaflet.DistortableImage/issues/1379#issuecomment-1465271641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAF6J3CHQMYKTAMZ5DZ7HTW3YK6VANCNFSM6AAAAAAVQP3O4Y . You are receiving this because you were mentioned.Message ID: @.***>
Okay @jywarren, I'll look into this. Many thanks!
Ah yes. I see - we get this error if we don't do that --
Access to image at 'https://web.archive.org/web/0id_/https://s3.amazonaws.com/grassrootsmapping/warpables/409/IMG_4155.JPG' from origin 'http://localhost:8082' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
I'm not sure... is there another way to access https://web.archive.org/web/20200506081918id_/http://s3.amazonaws.com/grassrootsmapping/warpables/417/img_0135.jpg without CORS issues? Otherwise, we could... upload that entire directory into an Archive collection, and serve it from there.
That is, wayback URLs have CORS limitations, but images in regular archive.org/download/_____
archive.org URLs do not.
Yes, I pointed out the fact of CORS limitation in my previous message. It was the reason I fetched from s3 directly.
Okay, but is there something wrong with fetching from s3 given that the legacy json files all have the image sources pointing to s3 either directly or indirectly ? For instance, https://web.archive.org/web/20200506081918id_/http://s3.amazonaws.com/grassrootsmapping/warpables/417/img_0135.jpg simply points to s3 indirectly nothing more.
Yes, sorry, just agreeing and confirming from my test. Thank you!
The only issue with s3 is that it costs Public Lab money to host -- it's not forever storage. I think perhaps the best choice is to create an archive.org collection and add to this logic in extractImageSource()
, where we replace http://s3.amazonaws.com/grassrootsmapping
with https://archive.org/download/mapknitter-wayback
I'm working on uploading all the files, but it'll be a while. We can check in here again once it's complete!
Ha! okay, I understand now. So archive.org option is definitely the route to take. I will check back then.
gosh it's going to take a while! it's 631,813 files, i'm only at downloading 3875...
I may try another way at a remote server that's faster... we'll see!
Yeah... this has to take a while
is this issue being worked on?
Hi, we are still working on uploading the archive.org collection, apologies!
I found a strange issue when I pointed at a collection of JSON files which have had images routed to the Internet Archive's Wayback Machine caches.
As you can see, the image links are routed to Wayback URLs: https://ia601603.us.archive.org/20/items/mapknitter-wayback/ceres--2.json :
i.e.: https://web.archive.org/web/0id_/https://s3.amazonaws.com/grassrootsmapping/warpables/305268/PuglisiTerrazzeHaghiaTriadaCretaAntica2007-28.jpg
However, when I actually load a page like this, somehow it still loads images directly from Amazon s3, not the Internet Archive:
https://publiclab.github.io/Leaflet.DistortableImage/examples/archive?json=https://archive.org/download/mapknitter-wayback/ceres--2.json
I inspected in the console and still can't figure it out.
@segun-codes @7malikk I was curious, if you had an interest in this, what do you think is happening here? Could any application logic we've written be causing this?
See for example the images at https://publiclab.github.io/Leaflet.DistortableImage/examples/archive?json=https://archive.org/download/mapknitter-wayback/ceres--2.json
still loads https://s3.amazonaws.com/grassrootsmapping/warpables/306187/DJI_1207.JPG