projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
50 stars 22 forks source link

Using staticSearch with multiple search pages #272

Closed peterrobinson closed 5 months ago

peterrobinson commented 1 year ago

My adventures with staticSearch have born fruit: you can see staticSearch integrated in our forthcoming digital edition of Boccaccio's Teseida at http://inklesseditions.com/TeseidaStatic/. Note this is still a work-in-progress. Comments are welcome but please DO NOT repost the address of this site on the general internet.

You can read about my adventures making this happen in a series of blog posts at http://scholarlydigitaleditions.blogspot.com/.

Apart from a few issues with default settings, staticSearch has worked like a dream. Many props to Martin and Joey, and the whole staticSearch community, for making such a wonderful tool. However, there is one area where I think some work needs to be done. As I comment in one of my blogs (http://scholarlydigitaleditions.blogspot.com/2023/07/setting-up-staticsearch-for-our.html), staticSearch is built on two assumptions:

  1. All the pages to be searched are held in the same folder as the root index.html folder.
  2. All searches are launched from a single place, and a single file, contained in that same folder holding all the project files

Neither is true of this Teseida project, or indeed any of our projects to come. The Tesedia project distributes some 2000 html pages across scores of folders, and every one of the 2000 html pages can launch a search. It was not particularly hard to persuade staticSearch to work this way, but it might have been easier. Here are two suggestions:

  1. staticSearch already has the facility of rewriting the root file by populating a <div id="staticSearch> element with its own code. Could this not be extended, so that EVERY file containing a <div id="staticSearch> element is rewritten to hold the staticSearch code? (We do not need that ourselves, as we include the staticSearch code as we build the files, but I can see circumstances where this would be useful).
  2. staticSearch could be much more intelligent about the links it creates to search-hit pages. At present, it knows that the page it has found is (for example) in the folder html/transcripts/AUT/1r.html, and so it creates the link "html/transcripts/AUT/1r.html" to that file. That would work fine if the file from which I was searching was at the document collection root. But because we are searching from (for example) "html/compare/1/1.html" the link fails, as staticSearch gets lost creating a link like "html/compare/1/html/transcripts/AUT/1r.html", where what it actually needs is "../../../html/transcripts/AUT/1r.html.

Again, it was rather easy for us to get around this problem by rewriting all the links using a function called from the this.searchFinishedHook function. But it seems to me that as staticSearch does so much else so well, it should do this too.

martindholmes commented 1 year ago

@peterrobinson Thank you for your feedback. It's true that staticSearch does assume a single root folder for all the HTML to be indexed, but it will handle any level of nesting within that folder, so assuming that all the HTML in your site is at some level within a single folder -- the folder for the site itself -- then the indexing should work fine, no matter how complex the organization or nesting is within that folder.

On the idea that we should discover and rewrite every instance of <div id="staticSearch> throughout the collection, that would break existing projects where we have multiple search pages with different controls and indexes for different purposes, such as DVPP:

https://dvpp.uvic.ca/search.html

and the Landscapes of Injustice archive:

https://loi.uvic.ca/archive/loiCollection.html

Since your project doesn't actually need this either, then I think it's probably not a good idea to introduce this behaviour.

On the question of the search page location, as a general principle, we tend to organize our own sites so that there is a base index file index.html which is the home page for the site, in the root folder of the site, and all the rest of the content also lives within the same folder (albeit perhaps nested in subfolders). We typically put the search page(s) alongside the home page, where there are usually other general information pages (About, Contact, etc.), so the search page is always creating links to pages which are either siblings of it or in nested folder structures from that location. If I'm understanding your use-case correctly, you would like to place the search page itself deeply nested within the site, and have it return links that traverse the tree upwards and then down again into other nested folders. My first question would be: why not put the search page at the root of the site?

That said, site organization is your business, and although links containing ../ always make me a bit nervous, there's no reason staticSearch shouldn't handle this. To make this work, we would have to separate out the location of the search file from the root of the collection to be indexed. We could add a separate configuration item for "collectionRoot", and then massage the index hit pointers to make them relative to the search page location based on the document root. @joeytakeda what do you think about this?

peterrobinson commented 1 year ago

Yes, indeed, staticSearch does a wonderful job of finding all the html by recursing through the entire site. As for the question: why not put the search page at the root of the site? The answer is that EVERY PAGE on our site is a search page. Which is the way we want it. Maybe it would be possible to, somehow, make it look as if the search is coming from a transcription or collation page while in fact running it from a single page at the site root. I just can't figure how to do it (and anyway, we now do have the searching running from each page fine). And yes, separating the "location of the search file from the root of the collection to be indexed" seems to me the solution.

martindholmes commented 1 year ago

We also often have a little search box on every page, but it generally submits to the search page which is in the root, which is the page that shows the results. It doesn't matter where in the tree the page containing the search query submission is; it's the result page which processes the search. In your temporary site, that's this page:

http://inklesseditions.com/TeseidaStatic/

which is in the root. So I'm not sure if there really is a problem or not there.

peterrobinson commented 1 year ago

Yes, indeed -- I see now I could have done this quite straightforwardly by sending the search to window.locatiion.href=index.html&q=the word(s) I am searching for. And that would indeed have given the same result (yes, I am going to a new page but it's likely that it will look to the user as if I am still on the same page -- and the back button would take me right back there too). However, I do think that it would be worth implementing relative addressing of search-hit pages. In fact, the case where the search comes from the document root is really just an instance of this relative addressing: the same algorithm should meet all cases, including that where search-page source is at document root, as well as where it is not.

martindholmes commented 1 year ago

I think this might be a question of documentation, ultimately. It probably only makes sense for the vast majority of users to have a single centralized search page (which will contain many filters and so on), but also have a simple text search box in the header of many pages. We could easily provide an explanation of how to do that and have the search query submitted to the central search page. We could also model that in our own documentation.

martindholmes commented 1 year ago

Confirming again that this is a documentation issue, and we can solve it by adding documentation which shows how to create a simple HTML form in any page which sends a search to the search page.

peterrobinson commented 1 year ago

I disagree. This is assuming that everyone is happy with sending the search to a single search page at the root. There are multiple possible reasons why this might not be what people want to do -- in which case, my suggestion of adding relative addressing to the search hit references is relevant. This does not seem a big deal to implement: simply add '../' for every directory the search page is deep below the document root. If the search page in at the document root, no "../". If it is three directories down (eg /html/transcripts/Hg/) then it is "../../../". But hey. I'm only me. We did get around this one, though with a bit of a hack, rewriting the links in hits page in a function called from this.searchFinishedHook.

martindholmes commented 5 months ago

I think what we're seeing here is a very specialist use-case -- I haven't seen any other project so far that would want to retrieve individual search results all over the site, at different levels, rather than in a single search page, although I do have several projects which have multiple search pages for different purposes, using different indexes and filters. I think if we were to add documentation to explain how to do this, we'd have to be able to explain why someone would want to, and I honestly don't understand that myself. :-)

peterrobinson commented 5 months ago

To be clear: our search page is hardly ever "http://inklesseditions.com/TeseidaStatic/"; it is only that address when we are doing a search from http://inklesseditions.com/TeseidaStatic/index.html. Most times, we are launching the search from http://inklesseditions.com/TeseidaStatic/html/transcripts/AUT/3r.html or http://inklesseditions.com/TeseidaStatic/html/compare/1/1.html or similar. Why? because we implement a uniform header on every page, regardless of context, and we want optimized navigation within each set of pages. Redirecting searches to a single page outside that set breaks the model.

martindholmes commented 5 months ago

I think this is the sort of thing that we should add if/when we have a substantial group of users who are asking for it; so far, it's only your project that wants to do this, so it's probably best treated as a project-level customization you carry out yourself.

martindholmes commented 5 months ago

Closing this in favour of #294.