statamic / docs

Statamic Documentation
https://statamic.dev
Other
117 stars 380 forks source link

Meilisearch Implementation #1539

Closed JohnathonKoster closed 2 weeks ago

JohnathonKoster commented 2 weeks ago

This PR provides a docsearch implementation using Meilisearch, aiming to make it a bit simpler to adjust how content is crawled, indexed, and surfaced to the end user.

We got tags!

image

Getting started

Once you have this branch pulled down, you will need to update your composer and npm dependencies. Before building the front-end assets, we'll need to update the Meilisearch host and api key inside resources/js/app.js:

docsearch({
    container: "#docsearch",
    host: "http://localhost:7700",
    apiKey: "c0384ca771d144ba4c1e5101b7dfda260ccc1c761f2059a6a4155782b8a76c41",
    indexUid: "default",
});

This is a good time to add the MEILISEARCH_HOST and MEILISEARCH_KEY key/value pairs to the .env file.

Updating the search index

Once everything has been configured (and the front-end assets build), we can update the search index like normal:

php please search:update

This shouldn't take terribly long; if everything has worked, you should see output similar to the following:

Created 3017 search sections.
Index default updated.

What's going on?

This PR swaps the collection: search provider for a docs: search provider (provided by https://github.com/stillat/documentation-search).

The docs: provider will crawl the entry's content and break it up into multiple documents based on the headings found in the final output. At least one search document will be created per-entry, with all sections included.

A new App\Search\DocTransformer was also created, which works with the docs: provider. The current implementation provides examples for how to add additional context based on the crawled content to help surface search results. Extra information added here is indexed, but not visible to users while performing searches.

For example, if it finds 'cache' and 'clear' in a document to be indexed, it will add 'delete cache' to the additional context, helping surface relevant topics:

image

All of the extra views!?

This PR adds a number of views within resources/views/documentation-search. These do not necessarily have to match/keep up with the regular views for the front-end (but they must capture the same structure and ids/links found on the live site). These search-specific views help to optimize the search crawling phase by only rendering what's absolutely required for the crawler. These can be deleted.

If removed, the App/Search/RequestContentRetriever class takes over and makes a GET request to each page when crawling, similar to the SSG process.

However, leaving the resources/views/documentation-search dramatically speeds up the crawling time, and interacts with some changes inside app/Providers/AppServiceProvider.php to not use Torchlight when updating the search index.