w3c / activitypub

http://w3c.github.io/activitypub/
Other
1.25k stars 78 forks source link

Publically Accessible Checksums of Content #392

Closed SlashScreen closed 1 year ago

SlashScreen commented 1 year ago

I've been talking to the ATProto devs about possible ActivityPub interoperability on their end, and one of the major issues they can see getting in the way is that the ATProto system needs content used within an app to be publicly idexable for hashing for validation. Is there any way to work around/provide checksums of content without making a post within an ActivityPub network publically indexable by web spiders/data harvesters?

See my discussion with them here

snarfed commented 1 year ago

This is probably more of a fediverse cultural norms question than a technical or protocol one. Afaik the ActivityPub protocol itself doesn't say anything about allowing or preventing "public indexing."

Many people believe that most fediverse servers block web crawlers via robots.txt, separate from AP. Looking at a few of them, some do, eg https://akko.wtf/robots.txt, https://misskey.social/robots.txt, but others don't, eg https://mastodon.social/robots.txt , https://lemmy.ml/robots.txt. Notably, it seems like Mastodon defaults to allowing web crawlers.

snarfed commented 1 year ago

Also, it's not easy to determine precisely which software is covered by robots.txt, but fortunately decentralized social software probably isn't. From https://www.robotstxt.org/faq/what.html :

A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.

This is not what any decentralized social software or protocols that I know of, whether acting as a user agent or otherwise (like an ATProto BGS), do.

SlashScreen commented 1 year ago

I see, alright,