Open dfreilich opened 4 years ago
@afrittoli i have added you and the rest of the governing board to have access to https://programmablesearchengine.google.com/cse/setup/basic?cx=013756393218025596041:6eajntqsa6c (which i think is a link to the tekton.dev custom search?)
at a glance i cant tell what's wrong:
it doesnt seem like it should be searching the entire web 🤔
I'm not sure if we can remove the ads but we can try. Another approach might be to use something to index the site ourselves.
I changed the URL from anything with tekton.dev
to URL pattern tekton.dev/*
which should hopefully remove all unwanted subdomains.
I also found a way to submit a sitemap, that hugo already produces at https://tekton.dev/sitemap.xml. URLs in the sitemap are relative and that is not correct, in fact when I try to submit a sitemap google search tells me that it finds 170+ errors:
It looks like me have to change the baseUrl to a valid one? https://discourse.gohugo.io/t/google-search-console-reports-sitemap-xml-as-invalid/10474/12
About ads, I found the following answer:
There is no paid version for Google Custom Search Engine (CSE). Please review the document below to understand current CSE offerings: https://support.google.com/customsearch//answer/9069107
By default CSE will display Ads in its results. In order to have the option to disable Ads, you can consider following the below options:
i) If your organisation is a non-profit, then you can have Ads disabled: https://support.google.com/customsearch/answer/4542102
ii) You can use the API (Key enabled from Cloud Developer Console) with your CSE engine and retrieve results using JSON API without Ads: https://developers.google.com/custom-search/v1/overview
iii) You can create an Adsense account and have it integrated with your CSE engine and can control to configure to not show competitors' ads on your website. Please review the document: https://support.google.com/customsearch/answer/4542011
Related issue on knative side: https://github.com/knative/website/issues/23
The sitemap is now generated correctly, if the search console and cse match the content should be coming from that.
However no indexing still, because we set <meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
in our homepage.
OK, to summarise:
https://tekton.dev/
*.tekton.dev/*
to exclude all subdomains (hub-preview, logs, prow etc)https://tekton.dev/
. The base URL matches that configured in the CSE, so, according to the docs the content in the CSE should come from the index built through the sitemapCurrent status:
Crawling by Google started, it is not yet complete, but results are starting to show.
The filter on the CSE was too eager, it looks like *.tekton.dev
matches tekton.dev
too, which is surprising, since the latter does not start with a ".". In any case, I changed the filtering to match each of the subdomains that we do not want.
It may be possible to give search results from older versions less relevance, something to look into as well.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/remove-lifecycle stale
/lifecycle frozen
Search now works, so I decreased the criticality of this item, but there are still a few issues to be solved:
Sorting results by date instead of relevance bring up the most recent pages first, but it may not be an ideal solution. 'Relevance' is the default method of sorting.
Expected Behavior
The search bar to display information on the site, and perhaps relevant tasks on Tekton Hub.
Actual Behavior
The search displayed ads, logs, and some somewhat relevant links, but not the documentation I was looking for.
Steps to Reproduce the Problem
Additional Info
I was hoping that search would allow me to surface documentation for concepts I was looking to precisely define (e.g.
Pipelines
, PersistentVolumeClaim`...), but I haven't seen any links shown in the search to the documentation.