internetarchive / dweb-mirror

Offline Internet Archive project
https://www-dweb-mirror.dev.archive.org/
GNU Affero General Public License v3.0
273 stars 31 forks source link

Problem with book images #372

Closed mitra42 closed 1 year ago

mitra42 commented 1 year ago

Part of meta-bug with books - #370

STR: (running locally) http://localhost:4244/details/1984-06-computegazette tries to load http://localhost:4244/BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0002.jp2&id=1984-06-computegazette&scale=4&rotate=0 which fails.

mitra42 commented 1 year ago

On Server: I'm seeing repeated lines like

dweb-mirror:ArchiveItem fetch_page: (skipNet) subPrefix=undefined zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0196.jp2 page=null scale=2 rotate=0  

note the page=null which looks

UPDATE: its ok - the lines aren't actually repeats - the 0196 is a page number and increments.

mitra42 commented 1 year ago

See server logs like

 dweb-transports Opening stream from [ https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 ]'
  dweb-mirror:MirrorFS cacheAndOrStream had error reading 1984-06-computegazette_Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0002.jp2 Transport Error https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 404: Not Found +75ms
  dweb-mirror:mirrorHttp Failed to proxy Transport Error https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 404: Not Found +41ms
  dweb-mirror:mirrorHttp No file in: /Users/mitra/git/github_internetarchive/dweb-archive/dist/BookReader/BookReaderImages.php ENOENT: no such file or directory, stat '/Users/mitra/git/github_internetarchive/dweb-archive/dist/BookReader/BookReaderImages.php' +1ms
  dweb-mirror:mirrorHttp FAILING: /BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0002.jp2&id=1984-06-computegazette&scale=4&rotate=0 +0ms
GET /BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0002.jp2&id=1984-06-computegazette&scale=4&rotate=0 - 500 2093 767.778 ms
TransportError: Transport Error https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 404: Not Found
    at /Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/Transports.js:437:14
    at /Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/createTester.js:36:13
    at wrapper (/Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/once.js:12:16)
    at replenish (/Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/eachOfLimit.js:76:25)
    at iterateeCallback (/Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/eachOfLimit.js:65:17)
    at /Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/onlyOnce.js:12:16
    at /Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/createTester.js:32:17
    at /Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/Transports.js:421:13
    at wrapper (/Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/internal/once.js:12:16)
    at next (/Users/mitra/git/github_internetarchive/dweb-mirror/node_modules/@internetarchive/dweb-transports/node_modules/async/waterfall.js:96:20)
mitra42 commented 1 year ago

I'm pulling out of that

https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 

Returning a 404, which is repeatable in browser (Firefox)

mitra42 commented 1 year ago

If I run a local instance of dweb-cors it has the same problem, but appears to be correctly directing it to the datanode

https://ia600501.us.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0002.jp2&scale=4&rotate=0 Not Found

Which suggests problem is possibly bitrot in formulae for getting pages .... lets try non-dweb browser and see what that gets.

mitra42 commented 1 year ago

Browser goes to

https://ia800501.us.archive.org/BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0000.jp2&id=1984-06-computegazette&scale=4&rotate=0

Which has three notable differences dweb: escapes "/" as %2F browser has id=1984-06-computegazette dweb goes to ia600501 and browser to ia800501

testing all combinations, only the lack of id= matters.

mitra42 commented 1 year ago

There is a line in dweb-mirror logs

  dweb-mirror:mirrorHttp FAILING: /BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0001.jp2&id=1984-06-computegazette&scale=4&rotate=0 +0ms
GET /BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0001.jp2&id=1984-06-computegazette&scale=4&rotate=0 

which contains id= so must be losing it somewhere.

mitra42 commented 1 year ago

Trying http://localhost:4244/BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0000.jp2&id=1984-06-computegazette&scale=4&rotate=0 log has

  dweb-mirror:mirrorHttp STARTING: /BookReader/BookReaderImages.php?zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0000.jp2&id=1984-06-computegazette&scale=4&rotate=0    +8s
  dweb-mirror:ArchiveItem fetch_page: subPrefix=undefined zip=/15/items/1984-06-computegazette/Compute_Gazette_Issue_12_1984_Jun_jp2.zip file=Compute_Gazette_Issue_12_1984_Jun_jp2/Compute_Gazette_Issue_12_1984_Jun_0000.jp2 page=undefined scale=4 rotate=0 +8s
  dweb-transports Opening stream from https://www-dweb-cors.dev.archive.org/BookReader/BookReaderImages.php?zip=%2F15%2Fitems%2F1984-06-computegazette%2FCompute_Gazette_Issue_12_1984_Jun_jp2.zip&file=Compute_Gazette_Issue_12_1984_Jun_jp2%2FCompute_Gazette_Issue_12_1984_Jun_0000.jp2&scale=4&rotate=0

OK - fixed it, id= didnt used to be required by BookReaderImages and it is now, so make sure its passed down. see CHANGELOG.

mitra42 commented 1 year ago

This will be in dweb-mirror 0.2.91