Closed cjyabraham closed 4 months ago
What prioritization would you put on this, @cjyabraham ?
/area web-development
I would say important-soon
@cjyabraham How much is it costing us? (may be drop an email to steering-private if not public?)
November cost $2400 just for the Bing API costs. I don't know Google off-hand.
@cjyabraham Thanks for opening this issue. With my SIG Docs co-chair hat on, I'd like to know what extra resources the CNCF plan to allocate to this work, given that SIG Docs as a whole at the moment doesn't have the people to work on this. Let's chat about this in the new year at one of our bi-weekly community meetings so that we can understand the scope and give a better estimate of effort in partnership with you.
/priority important-soon
(@natalisucks I don't know about official allocations of resources from CNCF but @caniszczyk may be able to help...)
As it is, I did a bit of research and got quite excited by this relatively new Hugo search option: Pagefind. I've tried it out on the kubernetes.io site on this PR with preview here: https://deploy-preview-44530--kubernetes-io-main-staging.netlify.app/
You can play with it there on the homepage just as a POC. LMK how that performs from your POV.
FYI you can learn more about Pagefind by watching this vid presenting how it works in general and what features are in the v1.0 release.
The performance (latency) for https://github.com/kubernetes/website/pull/44530 looks fine to me. We'd want to tweak the UI. Also, the relevance of results is a lot worse than the Google search.
For example, try searching for “feature gates”.
I suggest we focus on getting to a UI that works and won't provoke issues (right now, if we merged #44530, people would file bug reports). Then we can merge that and iterate on the quality of the search results.
Maybe there is a search API service that is more affordable and that we can use in combination with PageFind (or equivalent). That would be more work, but this is open source. Someone might actually relish the challenge.
A few thoughts:
Open questions:
If Pagefind results are "good enough", should we use it for China requests instead of Bing? If so, I can refine the PR to improve the UI before submitting it for review.
Seems legit. Help is welcome from people whose access to Google Search is blocked through national policy - or from anyone else who'd like to help.
Most people (≅ 67%) in China speak mandarin Chinese as their main language, so we should check more than superficially around at the search results in that language. I know Russia has state censorship too; their main language is Russian; again, the quality of indexing for English is not much of a guide.
On whether Kubernetes pays for the Google Search: I suspect it's easy to check, but hard to find contributors who know who to ask.
Help is most definitely welcome.
/area localization
/triage accepted
Algolia DocSearch is another option we could try. It's free, however, it would add a dependency on the Algolia crawler and search engine, and we'd need to display their branding.
@cjyabraham Our next SIG Docs community meeting is on January 9th at 18:30 UTC and I'll be adding this to our agenda for discussion. It would be great to see you there, and joining the SIG Docs Google Group will give you access to the calendar invitation for that call. We'll also post the link to our Zoom meeting in the #sig-docs channel on Kubernetes Slack on the same day
Hi @natalisucks , unfortunately that meeting would be at 1:30AM my time so I don't think I'll be able to make it. If you have any questions for me in the run up to the meeting please let me know.
@cjyabraham Per my comment here, I'd really like a rep from the CNCF to come and chat with us so we can understand how we can work in tandem with you, once again, given resources. Let me know who else I should reach out to. If @kubernetes/steering-committee members also wish to attend and discuss, alongside anyone else involved in the thread on Slack, that would be great.
Looping @jeefy @castrojo in here :)
I unfortunately have a conflict during the next SIG-Docs meeting, hopefully @onlydole might be able to join.
I think in terms of timing, we should try to target no later than KubeCon EU, but if we can all rally together sooner, dope.
Another thing to consider is potentially using a more local search engine. Example: Baidu's a member and also services the region. So... :) Those are my immediate two cents.
You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects
@onlydole Let's chat at the meeting next Tuesday then so we can do some planning and resource work. Thanks!
You betcha! I'll read up on some of these options and will be prepared to chat with our group next Tuesday.
There are few open-source and commercial search alternatives listed in the Hugo documentation (https://gohugo.io/tools/search).
You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects
I'm just back from vacation now and am catching up on things, always happy to help tho, just need a bit to read the various discussions that have already happened.
In Slack thread https://kubernetes.slack.com/archives/CPNFRNLTS/p1703270890381669, we have some inputs.
Some possible choices:
A question is there "Are we okay with ads on the chinese side of things? ".
Hey there!
At Algolia we love K8s. Indeed DocSearch is open and we'll be happy to provide an awesome (and free) experience your multi language documentation website.
Here's a demo we've set up - https://7cnxmj.csb.app/
What's the next step for this issue? I can't tell.
@sftim Hi Tim – as per our SIG Docs meeting on January 9th and the feedback shared there, @onlydole attended and has been tasked with the next step of the CNCF to come up with a plan and further research to improve search – due to the nature of this being a CNCF cost challenge, and not a strictly user/contributor-facing issue that SIG Docs leads would prioritize and lead work on. Taylor will hopefully be updating us on this issue when that plan and/or research is ready to share
Howdy, folks - we will be prioritizing this on the CNCF side after KubeCon + CloudNativeCon EU in Paris wraps up, and @nate-double-u will lead on that effort.
Hi @dylantientcheu, Thanks for building that demo. Could you contact me (natew@cncf.io)? I'd like to chat with you about it.
Sorry for the late update here. I updated the sig-docs meeting notes, but not this issue.
I've done some research and think that PageFind is the best candidate to replace Bing. Algolia was in the running, but between their fee structure and the fact that going with them will add another dependency, we're best off going with PageFind. PageFind is well used around the CNCF and we have some experience supporting it. It's designed to build its index as a part of the regular site build process, making the search a part of the site, sidestepping any issue we may run into with firewalled locations.
We should change as little as possible here, so we should only replace Bing, affecting only the localizations with Bing as their search provider (i.e., I don't think we should remove or update any other search providers).
@cjyabraham has made progress based on @sftim's work to implement PageFind for Chinese users and will soon open a PR for discussion/review.
PageFind solution has been deployed.
CNCF pays for the current search on this site. It uses both Google Custom Search (for requests outside of China) and Bing Search API (for requests in China). Due to recent changes in Bing Search pricing, this is now costing CNCF too much money so we should implement a different search solution such as those suggested here.
These other search-related issues could be addressed at the same time:
44416
43817
43291