nodeSolidServer / node-solid-server

Solid server on top of the file-system in NodeJS
https://solidproject.org/for-developers/pod-server
Other
1.78k stars 298 forks source link

Consider excluding crawlers by default #852

Open melvincarvalho opened 5 years ago

melvincarvalho commented 5 years ago

Anecdotally, we have had a number of users complain about their name and email address being made available to spiders. This could be done by adding a robots.txt file to a pod.

Solid, being a privacy first framework, should consider excluding crawlers by default.

It would be relatively easy subsequently to turn it on via a dashboard, or an app.

The downside to the approach is that an adoption vector of finding people via search engine is lost. And ability to create a social graph.

I'm personally unsure on the trade off here.

Good idea / Bad idea / Thoughts?

dmitrizagidulin commented 5 years ago

👍 from me on excluding crawlers by default. (or maybe like, just limiting them to the public/ folder.)

kjetilk commented 5 years ago

We should have a acl:Crawlers agent class, and more sophisticated ways to reason about membership in that class, but yeah, perhaps we should exclude them by default.

csarven commented 5 years ago

This issue is pretty much a duplicate of what was discussed at https://github.com/solid/node-solid-server/issues/694 and the initial PR that was made towards it https://github.com/solid/node-solid-server/pull/700 . Re public folder, note https://github.com/solid/node-solid-server/issues/694#issuecomment-393849118

melvincarvalho commented 5 years ago

I've made a page in solid hacks about interacting with crawlers

https://solid.gitbook.io/solid-hacks/server/interacting-with-crawlers

This could be a pointer to how the user can turn crawling on and off, deletion requests etc. in 2018. Over time more robots will interact with solid pods, so there is room to expand this into a larger spec.

If there was some sample text, I'd be happy to add it to the book.