kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.37k stars 14.13k forks source link

Implement cheaper search #44475

Closed cjyabraham closed 1 week ago

cjyabraham commented 6 months ago

CNCF pays for the current search on this site. It uses both Google Custom Search (for requests outside of China) and Bing Search API (for requests in China). Due to recent changes in Bing Search pricing, this is now costing CNCF too much money so we should implement a different search solution such as those suggested here.

These other search-related issues could be addressed at the same time:

sftim commented 6 months ago

What prioritization would you put on this, @cjyabraham ?

/area web-development

cjyabraham commented 6 months ago

I would say important-soon

dims commented 6 months ago

@cjyabraham How much is it costing us? (may be drop an email to steering-private if not public?)

cjyabraham commented 6 months ago

November cost $2400 just for the Bing API costs. I don't know Google off-hand.

natalisucks commented 6 months ago

@cjyabraham Thanks for opening this issue. With my SIG Docs co-chair hat on, I'd like to know what extra resources the CNCF plan to allocate to this work, given that SIG Docs as a whole at the moment doesn't have the people to work on this. Let's chat about this in the new year at one of our bi-weekly community meetings so that we can understand the scope and give a better estimate of effort in partnership with you.

sftim commented 6 months ago

/priority important-soon

cjyabraham commented 6 months ago

(@natalisucks I don't know about official allocations of resources from CNCF but @caniszczyk may be able to help...)

As it is, I did a bit of research and got quite excited by this relatively new Hugo search option: Pagefind. I've tried it out on the kubernetes.io site on this PR with preview here: https://deploy-preview-44530--kubernetes-io-main-staging.netlify.app/

You can play with it there on the homepage just as a POC. LMK how that performs from your POV.

FYI you can learn more about Pagefind by watching this vid presenting how it works in general and what features are in the v1.0 release.

sftim commented 6 months ago

The performance (latency) for https://github.com/kubernetes/website/pull/44530 looks fine to me. We'd want to tweak the UI. Also, the relevance of results is a lot worse than the Google search.

For example, try searching for “feature gates”.

sftim commented 6 months ago

I suggest we focus on getting to a UI that works and won't provoke issues (right now, if we merged #44530, people would file bug reports). Then we can merge that and iterate on the quality of the search results.

Maybe there is a search API service that is more affordable and that we can use in combination with PageFind (or equivalent). That would be more work, but this is open source. Someone might actually relish the challenge.

cjyabraham commented 6 months ago

A few thoughts:

Open questions:

  1. Can someone verify if Google Custom Search is, in fact, free, for kubernetes.io? I don't have access to its control panel, however, I run it for several other CNCF sites and it's free for all of them since we're nonprofit etc.
  2. If Pagefind results are "good enough", should we use it for China requests instead of Bing? If so, I can refine the PR to improve the UI before submitting it for review.
  3. Would it be worth investigating if Bing would be willing to donate credits for this since it's Kubernetes? Not sure if anyone has contacts there...
sftim commented 6 months ago

If Pagefind results are "good enough", should we use it for China requests instead of Bing? If so, I can refine the PR to improve the UI before submitting it for review.

Seems legit. Help is welcome from people whose access to Google Search is blocked through national policy - or from anyone else who'd like to help.

Most people (≅ 67%) in China speak mandarin Chinese as their main language, so we should check more than superficially around at the search results in that language. I know Russia has state censorship too; their main language is Russian; again, the quality of indexing for English is not much of a guide.

sftim commented 6 months ago

On whether Kubernetes pays for the Google Search: I suspect it's easy to check, but hard to find contributors who know who to ask.

cjyabraham commented 6 months ago

Here's the status for support for Chinese language in pagefind.

sftim commented 6 months ago

Help is most definitely welcome.

/area localization

sftim commented 6 months ago

/triage accepted

cjyabraham commented 6 months ago

Algolia DocSearch is another option we could try. It's free, however, it would add a dependency on the Algolia crawler and search engine, and we'd need to display their branding.

natalisucks commented 6 months ago

@cjyabraham Our next SIG Docs community meeting is on January 9th at 18:30 UTC and I'll be adding this to our agenda for discussion. It would be great to see you there, and joining the SIG Docs Google Group will give you access to the calendar invitation for that call. We'll also post the link to our Zoom meeting in the #sig-docs channel on Kubernetes Slack on the same day

cjyabraham commented 6 months ago

Hi @natalisucks , unfortunately that meeting would be at 1:30AM my time so I don't think I'll be able to make it. If you have any questions for me in the run up to the meeting please let me know.

natalisucks commented 6 months ago

@cjyabraham Per my comment here, I'd really like a rep from the CNCF to come and chat with us so we can understand how we can work in tandem with you, once again, given resources. Let me know who else I should reach out to. If @kubernetes/steering-committee members also wish to attend and discuss, alongside anyone else involved in the thread on Slack, that would be great.

idvoretskyi commented 6 months ago

Looping @jeefy @castrojo in here :)

jeefy commented 6 months ago

I unfortunately have a conflict during the next SIG-Docs meeting, hopefully @onlydole might be able to join.

I think in terms of timing, we should try to target no later than KubeCon EU, but if we can all rally together sooner, dope.

Another thing to consider is potentially using a more local search engine. Example: Baidu's a member and also services the region. So... :) Those are my immediate two cents.

caniszczyk commented 6 months ago

You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects

natalisucks commented 6 months ago

@onlydole Let's chat at the meeting next Tuesday then so we can do some planning and resource work. Thanks!

onlydole commented 6 months ago

You betcha! I'll read up on some of these options and will be prepared to chat with our group next Tuesday.

dipesh-rawat commented 6 months ago

There are few open-source and commercial search alternatives listed in the Hugo documentation (https://gohugo.io/tools/search).

nate-double-u commented 6 months ago

You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects

I'm just back from vacation now and am catching up on things, always happy to help tho, just need a bit to read the various discussions that have already happened.

pacoxu commented 6 months ago

In Slack thread https://kubernetes.slack.com/archives/CPNFRNLTS/p1703270890381669, we have some inputs.

Some possible choices:

  1. https://docsearch.algolia.com/ is another choice. This is free to opensource. I am not sure if CNCF projects can use it.
    • Kevin: For https://docsearch.algolia.com/, there are some CNCF projects integrated already on their website, but we may need more research on the search quality part.
  2. Another choice is using a redirect to bing.com directly for free.
  3. Currently, can we just support local search for kubernetes.io? I am not sure if Hugo can support the local search like VitePress, which supports fuzzy full-text search using a in-browser index.

A question is there "Are we okay with ads on the chinese side of things? ".

dylantientcheu commented 5 months ago

Hey there!

At Algolia we love K8s. Indeed DocSearch is open and we'll be happy to provide an awesome (and free) experience your multi language documentation website.

Here's a demo we've set up - https://7cnxmj.csb.app/

sftim commented 4 months ago

What's the next step for this issue? I can't tell.

natalisucks commented 4 months ago

@sftim Hi Tim – as per our SIG Docs meeting on January 9th and the feedback shared there, @onlydole attended and has been tasked with the next step of the CNCF to come up with a plan and further research to improve search – due to the nature of this being a CNCF cost challenge, and not a strictly user/contributor-facing issue that SIG Docs leads would prioritize and lead work on. Taylor will hopefully be updating us on this issue when that plan and/or research is ready to share

onlydole commented 4 months ago

Howdy, folks - we will be prioritizing this on the CNCF side after KubeCon + CloudNativeCon EU in Paris wraps up, and @nate-double-u will lead on that effort.

nate-double-u commented 3 months ago

Hi @dylantientcheu, Thanks for building that demo. Could you contact me (natew@cncf.io)? I'd like to chat with you about it.

nate-double-u commented 1 month ago

Sorry for the late update here. I updated the sig-docs meeting notes, but not this issue.

I've done some research and think that PageFind is the best candidate to replace Bing. Algolia was in the running, but between their fee structure and the fact that going with them will add another dependency, we're best off going with PageFind. PageFind is well used around the CNCF and we have some experience supporting it. It's designed to build its index as a part of the regular site build process, making the search a part of the site, sidestepping any issue we may run into with firewalled locations.

We should change as little as possible here, so we should only replace Bing, affecting only the localizations with Bing as their search provider (i.e., I don't think we should remove or update any other search providers).

@cjyabraham has made progress based on @sftim's work to implement PageFind for Chinese users and will soon open a PR for discussion/review.

cjyabraham commented 1 week ago

PageFind solution has been deployed.