Closed Perplexitus closed 1 month ago
Thanks for pointers, @mitra42.
I fixed the issue by going to "/opt/iiab/internetarchive/node_modules/@internetarchive" and replacing all mentions of "dweb.me", and "www-dweb-cors.dev." (including that period at the end), in all of the child directories and files with "archive.org".
After I did that, the UI started working properly and I am able to search collections myself on my iiab box. I now see what it means when it says that it acts as a proxy. There's a search bar that appears after it connects successfully. By using that search bar, I'm able to browse archive.org.
Great - glad its working - and somewhat surprised, though maybe its only some features that have to go through the "cors" gateway.
I realize now that after clicking "Go" twice to search, it redirected my to archive.org.
So, I reviewed the journalctl (for community members, that's "journalctl -u internetarchive -f"), and found the culprits. There were some advanced searches that didn't like an empty "and" array.
https://archive.org/advancedsearch.php?output=json&q=churchofJesusChrist.org&rows=30&page=1&sort[]=-downloads&and[]=&save=yes&fl=identifier%2Ctitle%2Ccollection%2Cmediatype%2Cdownloads%2Ccreator%2Cnum_reviews%2Cpublicdate%2Citem_count%2Cloans__status__status
So, I had to remove "&and[]=" from a few files for the advanced search query from a few files, and delete a line that contained "and[]" that manually built an advanced query.
After doing that, my searches started being successful.
Hi @mitra42,
Thanks for this information in "Issue #383 ". That helped me confirm that dweb.me/info isn't necessary.
I'm attempting to deploy an Internet in a Box (IIAB) for home and family use as an emergency preparation item. I'm trying to crawl a few websites (not necessarily whole collections).
However, each crawl doesn't get past "dweb-transports:httptools p_httpfetch: https://dweb.me/info '' +0ms" (from journalctl), since it times out.
I've been reviewing a few of the documentation files:
I've also been looking into TransportHTTP.js
urlbase: 'https://dweb.me',
Questions:
I'm okay if there is no (current) way to get collections. My main objective is to crawl many/all of the child pages of a specific URL, such as churchofjesuschrist.org/study/
I appreciate your time :)