muni-town / weird

Weird web pages
https://weird.one
Other
36 stars 10 forks source link

Source: Mastodon #174

Open erlend-sh opened 1 week ago

erlend-sh commented 1 week ago

Started here:

Good reference for broader backup coverage by @kensanata here: https://github.com/kensanata/mastodon-archive

Personally my most wanted feature for my Mastodon content on weird right now is an easy search through my post history.

zicklag commented 5 days ago

Search is maybe a little tricky, depending on how often we need to update the search index.

I think the easiest way to get started will be to try out tinysearch. The only limitation that will probably bug much initially will be that it only works on whole words, but that's probably fine for starters?

erlend-sh commented 4 days ago

Search is maybe a little tricky, depending on how often we need to update the search index. What about once a day?

Can the index be updated incrementally or would we have to do a complete re-indexing every time?

PageFind is another good candidate for this. I can’t quite tell if it has the same limitation of full words. In any case, that’s pretty much on par with the default in-client search of Mastodon.

zicklag commented 4 days ago

What about once a day?

Yeah, I think that's reasonable. We'll just have to keep an eye on what kind of processing power that's taking and whether the app keeps responding smoothly while building indexes, etc.

We have to do re-synchronization with Mastodon on a certain cadence anyway, so we should probably build the index whenever we sync.

Can the index be updated incrementally or would we have to do a complete re-indexing every time?

Incremental updates aren't a built-in feature, but now that I think about it, because Leaf is content addressed, and tinysearch is super simple, I think it's actually really easy to make it incremental. I'd have to test it to make sure, but I think that could actually work great.

I already checked out PageFind, which looks solid for the typical static site use-case, but doesn't seem to have a library mode where we can use it to index our own, non-HTML content.

I opened a discussion to make sure: https://github.com/CloudCannon/pagefind/discussions/708