Closed bardiharborow closed 3 years ago
There is very minimal benefit for clearnet users to run across three (WMF, WikiVisually and ipfs) different copies of the Wikipedia article every time they search for something.
The benefit is entirely for clearnet users. Tor users, for example, will (almost) always be able to access Wikipedia over tor so they'll see little benefit.
We should probably add a rel=canonical
link pointing to Wikipedia to the head of each page but I haven't thought through the possible ramifications/downsides of this approach.
The benefit is entirely for clearnet users. Tor users, for example, will (almost) always be able to access Wikipedia over tor so they'll see little benefit.
If anonymity is the concern, then accessing IPFS through the ipfs.io
endpoint is no more anonymous to large scale surveillance than accessing Wikipedia directly, and if anything I trust the Wikimedia Foundation to handle server logs better than ipfs.io
. Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.
We should probably add a rel=canonical link pointing to Wikipedia to the head of each page but I haven't thought through the possible ramifications/downsides of this approach.
Doing so would have the intended effect of removing the mirror from Google search results, and it is actually the preferred way to implement this.
@bardiharborow
If anonymity is the concern, then accessing IPFS through the ipfs.io endpoint is no more anonymous to large scale surveillance than accessing Wikipedia directly, and if anything I trust the Wikimedia Foundation to handle server logs better than ipfs.io. Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.
Ah, I think the confusion may be around the definition of "clearnet". IPFS is a clearnet. That is, it's not a darknet (it provides no anonymity at the moment). Darknets get no benefit because the exit nodes tend to be in countries with strong free speech laws.
Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.
Unlikely. We don't have any IPFS search mechanisms and rely entirely on web search engines. That's probably one of the reasons we don't use rel=canonical
links.
+1 for setting rel='canonical' links. I'm starting to see the mirror pop up frequently on the first page of Google results just from normal everyday use. Canonical links should avoid this duplication and make the mirror a good web citizen.
I agree with adding the rel="canonical"
: it's annoying to see search duplicates. By not indexing outdated content, you'll also alleviate the concerns with other issues such as https://github.com/ipfs/distributed-wikipedia-mirror/issues/55 https://github.com/ipfs/distributed-wikipedia-mirror/issues/49 .
Actually, what's the purpose of indexing all the pages at all? A noindex meta tag may be appropriate.
The lack of canonical tag comes from the htmls generated by kiwix's mwoffiler. I opened an issue https://github.com/openzim/mwoffliner/issues/564
The lack of canonical tag comes from the htmls generated by kiwix's mwoffiler.
I understand, but you can also add a canonical link in the webserver response headers.
I fixed this upstream (https://github.com/openzim/mwoffliner/pull/963) :ok_hand:
Old snapshots are about to be excluded via /robots.txt
(https://github.com/ipfs/website/pull/334)
Remaining steps before this issue can be closed:
<link rel="canonical"
OR:
I will be checking on mwoffliner/kiwix situation, but if someone has spare bandwidth and can to speed things up, please contribute upstream & post updates here.
This has been fixed by https://github.com/ipfs/distributed-wikipedia-mirror/issues/65 and will be solved upstream when new snapshots are published as part of #60 #61.
If possible, are you able to make your mirror non-indexed by internet search engines? There is very minimal benefit for clearnet users to run across three (WMF, WikiVisually and ipfs) different copies of the Wikipedia article every time they search for something.