Closed vlnn closed 1 year ago
(I'm guessing if it should go to dweb-transports repo issues, but would be grateful for confirmation here)
This is how the module internetarchive was installed to the IIAB system
Looks like you've caught a bug, but its obscure.
Thanks for the detailed log report it helps track it down.
If you look at the "Fetching" line at 7:51 that caused the error it shows a URL
https://www-dweb-cors.dev.archive.org/advancedsearch.php?output=json&q=compute!&rows=30&page=1&sort[]=-downloads&and[]=&save=yes&fl=identifier%2Ctitle%2Ccollection%2Cmediatype%2Cdownloads%2Ccreator%2Cnum_reviews%2Cpublicdate%2Citem_count%2Cloans__status__status
If I try this in a browser link it also fails.
Bug #1: Error handling in the callback at ArchiveItem.js:453:49 isn't catching that error (its presuming there is always a "docs" in the response, - so its being reported at dweb-transports/Transports.js:232:44
This dweb-cors is just to work around a problem with accessing across domains, so I tried direct to archive.org.
This also fails.
Bug #2: Internet archive advanced search isn't handling !
in a search - reported on internal Archive slack.
Note that if you try without the !
e.g. a search for Compute
it works though returns a much too wide set to be useful.
@vlnn & @mitra42 profound thanks.
Just FYI: Internet-in-a-Box will be publishing refined images in coming weeks, tightening up some core infra after 3 months.
If Internet Archive searching might be improved alongside these new images, or prior to end of year, that would be incredible (as IIAB 8.0 is expected in final form then!)
Great - I think the bug has to be fixed at Internet Archive's end, i.e. not in the image, though error handling could be improved.
Note - this project is not actively maintained at this time, though I'm happy to jump in and look at things occasionally.
I confirm that search for Compute (without !) works as expected, thanks (so, perhaps Bug#2 could go a bit down in priorities now as workaround is documented here). But do I understand correctly that there's a mismatch between what dweb and archive.org provides? Because I can for sure open https://archive.org/details/1984-06-computegazette/ , but not http://box.local:4244/details/1984-06-computegazette/ , as latter emits this log:
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.374Z dweb-mirror:mirrorHttp STARTING: /bookreader/BookReader/plugins/plugin.resume.js?v=891b5f7
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.375Z dweb-mirror:mirrorHttp STARTING: /bookreader/BookReader/plugins/plugin.archive_analytics.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.376Z dweb-mirror:mirrorHttp STARTING: /bookreader/BookReaderHelpers.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.378Z dweb-mirror:mirrorHttp STARTING: /bookreader/LendingFlow.js
Nov 13 20:59:53 box internetarchive[1138]: GET /bookreader/BookReader/plugins/plugin.resume.js?v=891b5f7 - 304 - 5.824 ms
Nov 13 20:59:53 box internetarchive[1138]: GET /bookreader/BookReader/plugins/plugin.archive_analytics.js - 304 - 5.448 ms
Nov 13 20:59:53 box internetarchive[1138]: GET /bookreader/BookReaderHelpers.js - 304 - 5.196 ms
Nov 13 20:59:53 box internetarchive[1138]: GET /bookreader/LendingFlow.js - 304 - 4.828 ms
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.384Z dweb-mirror:mirrorHttp STARTING: /bookreader/BookReaderJSIA.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.385Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/bookreader/BookReader/plugins/plugin.resume.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.385Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/bookreader/BookReader/plugins/plugin.archive_analytics.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.385Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/bookreader/BookReaderHelpers.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.385Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/bookreader/LendingFlow.js
Nov 13 20:59:53 box internetarchive[1138]: GET /bookreader/BookReaderJSIA.js - 304 - 2.014 ms
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.388Z dweb-mirror:mirrorHttp STARTING: /dweb-archive-bundle.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.388Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/bookreader/BookReaderJSIA.js
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.398Z dweb-mirror:mirrorHttp STARTING: /info
Nov 13 20:59:53 box internetarchive[1138]: GET /info - 304 - 1.238 ms
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.445Z dweb-mirror:mirrorHttp STARTING: /info
Nov 13 20:59:53 box internetarchive[1138]: GET /info - 304 - 1.351 ms
Nov 13 20:59:53 box internetarchive[1138]: 2022-11-14T01:59:53.454Z dweb-mirror:mirrorHttp STARTING: /info
Nov 13 20:59:53 box internetarchive[1138]: GET /info - 304 - 1.520 ms
Nov 13 21:00:00 box internetarchive[1138]: 2022-11-14T02:00:00.164Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/dweb-archive-bundle.js
Nov 13 21:00:00 box internetarchive[1138]: GET /dweb-archive-bundle.js - 200 4340209 2.307 ms
Nov 13 21:00:01 box CRON[3711]: pam_unix(cron:session): session opened for user www-data(uid=33) by (uid=0)
Nov 13 21:00:01 box CRON[3712]: (www-data) CMD ([ -x /usr/share/awstats/tools/update.sh ] && /usr/share/awstats/tools/update.sh)
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.230Z dweb-mirror:mirrorHttp STARTING: /info
Nov 13 21:00:02 box internetarchive[1138]: GET /info - 304 - 1.282 ms
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.252Z dweb-mirror:mirrorHttp STARTING: /components/manage/manage.css
Nov 13 21:00:02 box internetarchive[1138]: GET /components/manage/manage.css - 304 - 2.025 ms
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.255Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/components/manage/manage.css
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.257Z dweb-mirror:mirrorHttp STARTING: /languages/english.json
Nov 13 21:00:02 box internetarchive[1138]: GET /languages/english.json - 304 - 1.704 ms
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.260Z dweb-mirror:mirrorHttp sent file /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archive-dist/languages/english.json
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.305Z dweb-mirror:mirrorHttp STARTING: /metadata/1984-06-computegazette
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.311Z dweb-archivecontroller:ArchiveItem getting metadata for 1984-06-computegazette
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.312Z dweb-transports Fetching https://www-dweb-cors.dev.archive.org/metadata/1984-06-computegazette via HTTP
Nov 13 21:00:02 box internetarchive[1138]: 2022-11-14T02:00:02.313Z dweb-transports:httptools p_httpfetch: https://www-dweb-cors.dev.archive.org/metadata/1984-06-computegazette ''
Nov 13 21:00:02 box CRON[3711]: pam_unix(cron:session): session closed for user www-data
Nov 13 21:00:03 box internetarchive[1138]: 2022-11-14T02:00:03.201Z dweb-transports:httptools Fetch of https://www-dweb-cors.dev.archive.org/metadata/1984-06-computegazette opened
Nov 13 21:00:03 box internetarchive[1138]: 2022-11-14T02:00:03.209Z dweb-transports Fetching https://www-dweb-cors.dev.archive.org/metadata/1984-06-computegazette via HTTP succeeded NaN bytes
Nov 13 21:00:03 box internetarchive[1138]: 2022-11-14T02:00:03.209Z dweb-archivecontroller:ArchiveItem metadata for 1984-06-computegazette fetched successfully
Nov 13 21:00:03 box internetarchive[1138]: 2022-11-14T02:00:03.225Z dweb-transports:httptools p_httpfetch: https://www-dweb-cors.dev.archive.org/BookReader/BookReaderJSIA.php?subPrefix=Compute_Gazette_Issue_12_1984_Jun&server=www-dweb-cors.dev.archive.org&audioLinerNotes=0&id=1984-06-computegazette&itemPath=%2F15%2Fitems%2F1984-06-computegazette&format=json&requestUri=%2Fdetails%2F1984-06-computegazette ''
Nov 13 21:00:03 box internetarchive[1138]: 2022-11-14T02:00:03.665Z dweb-transports:httptools Fetch of https://www-dweb-cors.dev.archive.org/BookReader/BookReaderJSIA.php?subPrefix=Compute_Gazette_Issue_12_1984_Jun&server=www-dweb-cors.dev.archive.org&audioLinerNotes=0&id=1984-06-computegazette&itemPath=%2F15%2Fitems%2F1984-06-computegazette&format=json&requestUri=%2Fdetails%2F1984-06-computegazette opened
Nov 13 21:00:04 box internetarchive[1138]: 2022-11-14T02:00:04.123Z dweb-transports:httptools GET Uncaught error in callback TypeError: Cannot convert undefined or null to object
Nov 13 21:00:04 box internetarchive[1138]: at /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-archivecontroller/ArchiveItem.js:279:20
Nov 13 21:00:04 box internetarchive[1138]: at /opt/iiab/internetarchive/node_modules/@internetarchive/dweb-transports/httptools.js:185:9
Nov 13 21:00:04 box internetarchive[1138]: at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Nov 13 21:00:09 box internetarchive[1138]: 2022-11-14T02:00:09.404Z dweb-transports:httptools p_httpfetch: https://dweb.me/info ''
Nov 13 21:00:10 box internetarchive[1138]: 2022-11-14T02:00:10.122Z dweb-transports:httptools Fetch of https://dweb.me/info opened
Nov 13 21:00:14 box internetarchive[1138]: 2022-11-14T02:00:14.494Z dweb-mirror:mirrorHttp STARTING: /info
Nov 13 21:00:14 box internetarchive[1138]: GET /info - 304 - 5.116 ms
and shows page with only Loading 1984-06-computegazette
Is this the same problem or I better open new issue?
Could you open that as a new issue - I can confirm its real and I'll take a look at it - looks like bit-rot at the Archive end.
Diving into Bug#1: Search for compute (localhost)[http://localhost:4244/advancedsearch.php?output=json&q=Compute!&rows=30&page=1&sort[]=-downloads&and[]=&save=yes&fl=identifier%2Ctitle%2Ccollection%2Cmediatype%2Cdownloads%2Ccreator%2Cnum_reviews%2Cpublicdate%2Citem_count%2Cloans__status__status] Shows error from dweb-transports in log but the error is actually in dweb-archivecontroller/ArchiveItem.js:454:49 which supplies a callback that doesnt handle an error.
Turns out elasticsearch returns an error in the "error" field of the response rather than an HTTP error, handle that in ArchiveItem.js will fix there in archivecontroller:0.2.16
FIxed in 0.2.94 which uses dweb-archivecontroller 0.2.16
Testing IIAB's module of internetarchive I can't see the real content. I guess it may be some misconfiguration (either on IIAB's installation or my installation process), but I would appreciate any hints on what to look to.
http://box.local:4244/details/home shows frontpage of Internet Archive. Then I search for e.g. Compute! magazine using search form. Browser returns "Loading search" page and then there's no activity on the page. In logs gathered by
journalctl -u internetarchive -f
I have suspicious errors on JS-code (see below). I've tried to find similar issues but haven't got any luck. I can reproduce this every time I'm searching via this interface.