Open marsara9 opened 1 year ago
While I'd love to include comment data, I'm noticing some major blockers to the current approach I'm using to crawling Lemmy.
Currently I call /api/v3/post/list?type_=All&sort=Old&limit=50&page=...
. It appears that using the type_=All
is incredibly slow on larger instances. Lemmy.world for example takes around 8-9 seconds when set to All
vs 1-2 seconds when set to Local
.
Next, assuming comments have the same or similar issue, we can end up doubling or tripling the time requirements to index a site.
Luckily I already have a plan to fix this but it hinges on posts having a universal id across the fediverse. This way I begin to just index a given instance's local posts and can use that universal id to create links for other instances. I just have no idea on when / if that will be resolved.
@marsara9 All posts and comments already have a globally unique id, which is the ap_id
value (activitypub id). Its the url where that object can be viewed on its home instance. Its preferable to use this link when opening a search result. The same link can also be used to fetch a remote post to your instance, by pasting it in the search field.
Currently the crawler only indexes posts via
/api/v3/post/list
. This should be updated to also crawl comments for the posts that were fetched this way via/api/v3/comment/list
. This way when a user searches for something not only is the post title/body scanned but so are all of the comments.