jointakahe / takahe

An ActivityPub/Fediverse server
BSD 3-Clause "New" or "Revised" License
1.12k stars 86 forks source link

add support for enabling Mastodon 4.2 search indexing #656

Open osmaa opened 10 months ago

osmaa commented 10 months ago

adds the opt-in attribute which enables Mastodon 4.2 to index toots from an account

osmaa commented 10 months ago

I was considering that, too, but it seemed to me that it was conceptually a little different. search_enabled is a feature flag for (a partially implemented?) local search of local accounts, while indexable is an AP Actor level flag for opt-in to being indexed on remote servers.

Was this the other way around, ie indexable had come first, then it would be obvious to implement search_enabled on top of that.

Heads up though! The migration appears to have created this as a non-nullable column, and I missed at least one code path which leaves the attribute null during fetch/create. Will review.

osmaa commented 10 months ago

Hmm, my lack of experience with Django shows again. As far as I can see, indexable is defined to default to false everywhere, but somehow it's still passed as null into a database insert here.. https://github.com/osmaa/takahe/blob/883c607468252fdfbf107cfe5c35ca86d6afc70c/users/models/identity.py#L452

andrewgodwin commented 10 months ago

I agree they are different meanings conceptually, but I still would like to combine the meanings now - two search options just seems too many, and I don't see a lot of cases where people would enable it locally but not remotely and vice-versa. It's a little bit of an expectation-breaking change, but I am alright with it in this instance.

osmaa commented 10 months ago

Fully agree that the privacy-related settings in Mastodon are too many. I've been meaning to outline a matrix of all of the possible combinations to see which of them even make sense. I don't know what to make of the existence of these technically valid combos, for example:

discoverable=false, indexable=true, toot=public (it's not listed on Mastodon's local timeline, but can be found by text search) discoverable=true, indexable=false, toot=public (is listed, but not search indexed) discoverable=true, indexable=true, toot=unlisted (not listed nor searchable) discoverable=true, indexable=true, noindex=true (opted in to be indexed by everyone but web search engines) discoverable=false, indexable=false, noindex=false (opted out of being found on Mastodon, while allowing web search crawlers)

It's a mess. Is it a mess that can be cleaned up? If it was just me, I'd just merge all three account level settings to one (values: promote, search, unlisted), and disallow use of "public" toot level on unlisted accounts.

andrewgodwin commented 10 months ago

Right, it being a bit of a mess was kind of the thing I wanted to avoid. I do think that in Takahē's case, with just two options - "discoverable" and "search_enabled" - we end up with only three sensible configurations:

I'm not sure how sensible it might be to make the UI switch search off if you flip discoverable off, but it feels like it should.

osmaa commented 10 months ago

I would argue that:

Discoverable but not searchable: Maybe you're trying to avoid harassment enabled via search

is superfluous and should be instead delivered by automatic pruning of old toots from both timelines and search indices. "Allow my toots to be discovered but only for X days/weeks".

While your:

Not discoverable but searchable: Should not be allowed, makes no sense

That would be someone who opts in to be found by explicit search, but wouldn't want to be shown in trending lists or being algorithmically promoted.

I didn't even include that Mastodon further complicates this by having different logic for hashtags. Again, if it was just me, I'd say that hashtags should be restricted to public toots only. Yes, there are nuances like being generally unsearchable but opening tiny windows into discovery on very specific topics only, but the complexities around documenting that kind of behavior make it into a trap.

So the question really is, how much does it make sense to try to do things different to Mastodon, which has evolved to a weird legacy of incompatible layers, but is the dominant source and consumer of ActivityPub content. Plus, if you still also have plans of also exploring AT proto PDS functionality, that'll map different. Mostly just 100% public with no control over third party indexing, though..

andrewgodwin commented 10 months ago

Well, automatic pruning of local things from searches would be nice, but that's a separate feature so I'm not going to say we should do that now.

In general I want to keep Takahē relatively low on options and complexity - so I think just tying Mastodon's indexable property to "search enabled" and changing its help text to say that it enables you to be searched locally and remotely would be the way to go here.

alphatownsman commented 10 months ago

this seems quite important feature for users like mine, regardless of separate option or not. what's best next step to get it merged?

andrewgodwin commented 10 months ago

I'm willing to accept a PR for this that just does this flag based off of our existing search_enabled and discoverable flags, where you get marked as having search indexing allowed if they're both true.

alphatownsman commented 10 months ago

does this flag based off of our existing search_enabled and discoverable flags

There's no perfect solution and I can totally live with this.

how much does it make sense to try to do things different to Mastodon

@osmaa I agree with you this is real concern if Mastodon exposes these searchable options separately via API, but right now they are only changeable in UI I guess, so I'm ok with Andrew's suggestion above.

alphatownsman commented 7 months ago

@osmaa @AstraLuma this is absolutely great feature. any chance get this updated / merged? happy to do anything I can to help.