Open SeanPedersen opened 3 years ago
Search isn't directly supported, but theoretically could be in the future.
One options is to search for site:en.wikipedia-on-ipfs.org SEARCH TERMS
in your preferred search engine to discover new pages:
We have some prior art in #44 Code is 4 year old but could be a good starting point if someone has bandwidth to help with this.
In case somoneone wants to pick this up before I have spare bandwidth: simply re-use existing UI from mobile Wikipedia: https://en.m.wikipedia.org/, which already has subtle branding + search box:
Hamburger menu could be replaced with our icon, and clicking on it would jump to the footer explaining the mirror project.
Both Google and DDG have methods of adding a custom website search bar to your website:
https://cse.google.com/ https://duckduckgo.com/search_box
I tested both out and unfortunately, the results I'm getting with DDG are all 404 errors because it's putting .html at the end of URLs. If you want to try both out, here are some links:
https://cse.google.com/cse?cx=230751f5750677644 https://duckduckgo.com/search.html?site=en.wikipedia-on-ipfs.org&prefill=Search%20Wikipedia%20on%20IPFS
EDIT: It's also worth noting that both engines do have ads above actual results. It is possible to remove ads (and branding) on DDG with URL params, but it's against ToS unless used for personal use.
There has recently been some work on hosting a full-text search engine in WebAssembly for very large data sets. This was directly influenced by IPFS's hosting of Wikipedia.
The key feature is to pull only the data needed from the static index to the client to execute the search. For example, doing a full text search on an index of size 14 GByte takes 2 seconds, and only needs to download only ~1.5MByte of the index.
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust. It was initially designed to run on a server, but Rust can run in WebAssembly. There is a pull request https://github.com/tantivy-search/tantivy/pull/1067 that adapts Tantivy to running fully in WebAssembly on the client side.
See the pull request for a demo using the Wikipedia dataset.
(tantivy creatomaintainer and quickwit CEO) quickwit (https://github.com/quickwit-inc/quickwit) aims precisely at allowing client-side search on a distant high latency storage. We are in the process of opensourcing our code under the AGPL license. Once this is done. We'd be happy to help.
Speaking of which, what is the code of distributed-wikipedia-mirror
licensed under? Because if it isn't GPLv3 too it won't be able to use quickwit
This shows how it can be done with a static sqlite database that serves as the index. Sqlite supports full text search. Sqlite static hosted
In Brave Browser, you can create a keyboard shortcut for text that will prefix whatever you type after activating said keyboard shortcut, which can be used to search for IPFS Wikipedia pages. In Brave, go to "Settings > Search engine > Manage search engines and site search > Add", which will prompt you with a dialog box to add a search engine. For example, if you want to use Brave's search engine to search for IPFS Wikipedia pages, you can input https://search.brave.com/search?q=site%3Aen.wikipedia-on-ipfs.org %s
for the URL with %s in place of query
field (and whatever you want for the Search engine
and Shortcut
fields).
I am wondering how to use ipns://en.wikipedia-on-ipfs.org/wiki/ effectively? I see no option to search for an article. How am I supposed to find the content I am looking for?