tektoncd / website

Tekton Website
https://tekton.dev
Apache License 2.0
63 stars 149 forks source link

Improve Search #191

Open dfreilich opened 3 years ago

dfreilich commented 3 years ago

Expected Behavior

The search bar to display information on the site, and perhaps relevant tasks on Tekton Hub.

Actual Behavior

The search displayed ads, logs, and some somewhat relevant links, but not the documentation I was looking for.

Steps to Reproduce the Problem

  1. Go to https://tekton.dev/
  2. Search for "Pipeline"
  3. Look at the results

Additional Info

I was hoping that search would allow me to surface documentation for concepts I was looking to precisely define (e.g. Pipelines, PersistentVolumeClaim`...), but I haven't seen any links shown in the search to the documentation.

bobcatfish commented 3 years ago

@afrittoli i have added you and the rest of the governing board to have access to https://programmablesearchengine.google.com/cse/setup/basic?cx=013756393218025596041:6eajntqsa6c (which i think is a link to the tekton.dev custom search?)

at a glance i cant tell what's wrong:

image

image

it doesnt seem like it should be searching the entire web 🤔

I'm not sure if we can remove the ads but we can try. Another approach might be to use something to index the site ourselves.

afrittoli commented 3 years ago

I changed the URL from anything with tekton.dev to URL pattern tekton.dev/* which should hopefully remove all unwanted subdomains.

I also found a way to submit a sitemap, that hugo already produces at https://tekton.dev/sitemap.xml. URLs in the sitemap are relative and that is not correct, in fact when I try to submit a sitemap google search tells me that it finds 170+ errors:

image

afrittoli commented 3 years ago

It looks like me have to change the baseUrl to a valid one? https://discourse.gohugo.io/t/google-search-console-reports-sitemap-xml-as-invalid/10474/12

afrittoli commented 3 years ago

About ads, I found the following answer:

There is no paid version for Google Custom Search Engine (CSE). Please review the document below to understand current CSE offerings: https://support.google.com/customsearch//answer/9069107

By default CSE will display Ads in its results. In order to have the option to disable Ads, you can consider following the below options:

i) If your organisation is a non-profit, then you can have Ads disabled: https://support.google.com/customsearch/answer/4542102

ii) You can use the API (Key enabled from Cloud Developer Console) with your CSE engine and retrieve results using JSON API without Ads: https://developers.google.com/custom-search/v1/overview

iii) You can create an Adsense account and have it integrated with your CSE engine and can control to configure to not show competitors' ads on your website. Please review the document: https://support.google.com/customsearch/answer/4542011

afrittoli commented 3 years ago

Related issue on knative side: https://github.com/knative/website/issues/23

afrittoli commented 3 years ago

The sitemap is now generated correctly, if the search console and cse match the content should be coming from that.

afrittoli commented 3 years ago

However no indexing still, because we set <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> in our homepage.

afrittoli commented 3 years ago

OK, to summarise:

Current status:

afrittoli commented 3 years ago

Crawling by Google started, it is not yet complete, but results are starting to show. The filter on the CSE was too eager, it looks like *.tekton.dev matches tekton.dev too, which is surprising, since the latter does not start with a ".". In any case, I changed the filtering to match each of the subdomains that we do not want.

image

It may be possible to give search results from older versions less relevance, something to look into as well.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

afrittoli commented 3 years ago

/remove-lifecycle stale

afrittoli commented 3 years ago

/lifecycle frozen

afrittoli commented 2 years ago

Search now works, so I decreased the criticality of this item, but there are still a few issues to be solved:

Sorting results by date instead of relevance bring up the most recent pages first, but it may not be an ideal solution. 'Relevance' is the default method of sorting.

image