Integrate Lemmy Explorer for Community Search

Fmstrat commented 1 year ago

Is your feature request related to a problem? Please describe. Related: https://github.com/tgxn/lemmy-explorer/issues/137

LE is a web crawler for communities that works very well. Hopefully the above is accepted as a path forward and we can directly leverage the great work they are doing via an API. If not, and if there is interest here, I can set up a containerized API service that pulls the Redis data dump nightly to provide access to the app for Thunder users.

Describe the solution you'd like A search that finds communities no matter where they are, directly built into Thunder.

Describe alternatives you've considered Using LE outside of Thunder.

Fmstrat commented 1 year ago

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:

1) Nightly pulls of the JSON, with an API that searches 2) If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

hjiangsu commented 1 year ago

This is a duplicate of #14 - I'll close this to keep things clean but feel free to let me know if you think they are separate issues @Fmstrat!

tgxn commented 1 year ago

There is also the Data library file: https://data.lemmyverse.net/data/community.full.json

Which contains ~18M of data on all communities. Two options present themselves outside of direct integration:
1. Nightly pulls of the JSON, with an API that searches

2. If a user chooses "search all" in communities, it displays a modal letting them know it's pre-downloading community data for a faster experience, and just download the JSON locally for searching.

I would love for you to use the data dumps, I wanted to pre-load the processing for my frontend, so I wouldn't need to have any backend server infrastructure (other than the the crawler). That's why they are so big. I don't really plan on hosting a dedicated search api, but I could probably whip something simple up that people could host in docker or something...

For my frontend, I split the data into chunks of ~150 items, and then load them all in parallel, with TanStack.

Undocumented, but I have the chunked data on the data page too: https://data.lemmyverse.net/data/community.json will give you the count of chunks, and then the chunks are https://data.lemmyverse.net/data/community/0.json.

Fmstrat commented 1 year ago

@hjiangsu Since we have the Lemmyverse Dev participating in this issue (in response to our discussion), should we reopen this and close #14 instead?

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

hjiangsu commented 1 year ago

Opened - feel free to continue discussion here!

tgxn commented 1 year ago

@tgxn First off, thanks for joining the conversation! My one concern with the loaded data dump is size over time (especially in Flutter). I'm not sure how the static file will hold up over years of new instances and communities. While it may be fine, have you done any load testing in JS with hundreds of thousands of data points yet? (If not we could generate and see).

hey sorry I am not the best at replying or following thing up in reasonable amount of time 😂

i reckon it could be an issue for communities at the least. i found it's a looot of data, especially if you want to let people search by stuff in the descriptions. I have to split it apart and do as much processing on the crawler as I can, and even then, it's ~14MB or something.

I did a few load tests with different ways of splitting or compressing it, but i've still got work to do on that.

as for suggestions, I would only bundle a minified bundle of community names and instances (and think about if you reallly need this), and maybe offer a way for users to "fetch new data" - which could download the chunked version of the data from lemmyverse.

thunder-app / thunder

Integrate Lemmy Explorer for Community Search #508