kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.95k stars 903 forks source link

Parent task: Improve search functionality in Kedro documentation #2798

Closed stichbury closed 10 months ago

stichbury commented 1 year ago

We must find a way to fix search; when users search for terms it should come up with content relevant to our Markdown documentation and not our API docs (measured by: increase in search; might be able to see if people are following links from search)

astrojuanlu commented 1 year ago

Options:

DocSearch provides search for open source projects and technical blogs for free. You can apply and receive your credentials.

stichbury commented 1 year ago

Now has a child task #2932 and we'll explore Algolia (option 2) through the sphinx-docsearch extension.

astrojuanlu commented 1 year ago

As an alternative to #2932, looks like RTD "search as you type" is on, see for example https://vizro.readthedocs.io/

Screen Recording 2023-09-26 at 08 45 41

stichbury commented 1 year ago

Problem I have with search as we have it is that it shows all the API docs as well as the markdown and these tend to flood the results with less useful links. The ideal would be that we could tweak the index to narrative content and allow the user to add search across API content only if they want that. I’m not convinced the RTD search is sufficiently sophisticated so this new feature will still return too many false positive results. Not to say Algolia will be better of course!

astrojuanlu commented 1 year ago

I’m not convinced the RTD search is sufficiently sophisticated so this new feature will still return too many false positive results.

Very good point, indeed I think Algolia will allow us to tweak the results in a much more fine-grained fashion

astrojuanlu commented 1 year ago

We abandoned the Algolia effort for now but found some bugs in the RTD search-as-you-type new functionality https://github.com/readthedocs/addons/issues/165

astrojuanlu commented 1 year ago

To use the RTD "search as you type" functionality, we need a tiny bit of frontend code:

  1. disable Sphinx search keyboard / shortcut.
  2. add a JS code to trigger the event readthedocs-search-show when div[role=search] > form > input gets focus.

(@humitos at https://github.com/readthedocs/addons/issues/165#issuecomment-1781287582) because at the moment it's only triggered from the search box inside the modal, but we'd like it here:

Screenshot 2023-10-26 at 16 57 41

cc @tynandebold does this warrant its own ticket, since this is a parent one? For your easier tracking

stichbury commented 11 months ago

I've been wondering more about whether we just hand this off to Google (since we know a lot of people use it anyway rather than battle with site search).

Take a look at https://www.theguardian.com/uk which simply passes search to Google. We could maybe find a way to replace the search box with a link to https://www.google.co.uk/advanced_search?q=site:docs.kedro.org (or take the input from search and pass through).

tynandebold commented 11 months ago

Wow, that's interesting. So they don't have their own search at all. Clicking their search "link" takes me here: https://www.google.co.uk/advanced_search?q=site:www.theguardian.com

We could do that very quickly.

stichbury commented 11 months ago

Yes and it's been that way for some years.

I don't think the experience is beautiful because it takes you outside their site and dumps you in a big form to do a search (cue "All I wanted was to find a recipe for hot cross buns" wailing) but on mobile and small screen it doesn't open the Google search but lets you pop the text into the bar and then passes it, which is reasonable. The results are still formatted as per Google rather than Guardian though, so it's a bit jarring.

Question is -- how easy is it to take the input from the search bar instead of it passing to Sphinx, and send over to Google?

astrojuanlu commented 11 months ago

I think specialised search will yield better results, no?

Also I think it's a meh user experience to click on the search bar and suddenly land in another website.

Question is -- how easy is it to take the input from the search bar instead of it passing to Sphinx, and send over to Google?

Are there any blockers to try https://github.com/kedro-org/kedro-viz/issues/1612 out?

stichbury commented 11 months ago

Are there any blockers to try kedro-org/kedro-viz#1612 out?

None except time (and Tynan has offered to support Rashida if she needs extra hands).

I don't think add-on offers the same level of filtering that Google advanced search would provide (to filter out API results) but definitely agree about the UX being better if we stay on-site.

stichbury commented 10 months ago

I'm closing this for now. I still don't think search is particularly good (try searching for "node" and see what you get as first result) but we have tried Algolia, and my best guess for improving things is to separate Kedro markdown docs into a separate repo and deploy that as a subproject with API docs separated, so search doesn't run over the api docs. I think we have a separate issue for that, and for now, this is (kinda) done.

astrojuanlu commented 10 months ago

@stichbury We can tweak the search ranking algorithm https://docs.readthedocs.io/en/stable/config-file/v2.html#search-ranking worth trying to explore that before we take on more dramatic changes (for example, favouring narrative docs over API docs).

Do you want to open a follow-up issue about it?