kiwix / operations

Kiwix Kubernetes Cluster
http://charts.k8s.kiwix.org/
7 stars 0 forks source link

Should we change `robots.txt` at https://library.kiwix.org #232

Open kelson42 opened 2 months ago

kelson42 commented 2 months ago
curl https://library.kiwix.org/robots.txt
User-agent: *
Disallow: /

This forbids everything and this maybe not the best thing to do do advert our library?!

rgaudin commented 2 months ago

Do we want to bring search engine's attention to the library? It's basically a copy of multiple other sources (known to be disliked by search engines), it doesn't bring people to Kiwix because there's no mention of Kiwix there ; nor the readers or the format or anything. It also pollutes search engines with outdated data and finally it increases load on our machine for traffic we're not interested in. As far as I can remember, we've always want to avoid this (See https://github.com/kiwix/container-images/issues/13). What's changed?

kelson42 commented 2 months ago

A this stage, IMO, the catalog part of library.kiwix.org should crawled, but not the demo part.

rgaudin commented 2 months ago

What do you mean by catalog part? The homepage or /catalog?

If /catalog:

Anyway, why would we drive people towards library.kiwix.org if they are not told where they are, what Kiwix is, etc?

Popolechien commented 2 months ago

I know for a fact that the zim files get crawled already as we regularly receive spam-like emails that are pretty much always like

Hi, I noticed that a broken link appears on this page: http://library.kiwix.org/wikipedia_en_computer_2017-04/A/Android_(operating_system).html Link text "Global mobile statistics 2014 Part A: Mobile subscribers; handset market share; mobile operators" The screenshot is attached below, points to https://mobiforge.com/mobile-marketing-tools/latest-mobile-stats which is not alive anymore. Also, you may consider replacing the broken link with this exact updated resource which points back to the right pages within that website: https://vivipins.com/mobile-marketing-statistics/ Let me know if there’s anything else I can help you with!

They are basically trying to replace a random link with theirs, I guess as a form of SEO optimization. They never seem to realize that they're pointing at a Wikipedia page and the text barely ever changes so I'm suspecting a fully automated operation.

Long story short we could do without these, and I see no material advantage for us to drive traffic to content pages (as opposite to letting people access the source material or driving folks to the more generic library.kiwix.org landing page)