nanos / FediFetcher

FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.
https://blog.thms.uk/fedifetcher?utm_source=github
MIT License
307 stars 228 forks source link

Slow, would multi-threading help? #163

Closed smiba closed 1 week ago

smiba commented 1 week ago

Hi,

I'm first-time running FediFetcher but I've noticed it takes a really long time, even though my CPU, IO or bandwidth usage is nearly flat.

It seems that it only processes one request at a time, would it maybe make sense to have some kind of thread pool running to speed the process up? I think this would also be beneficial to people running this on GitHub Workers, as they're less likely to hit runtime timeouts

nanos commented 1 week ago

Yeah, this comes up from time to time.

In my view there are a couple of reasons why I don't want to implement it myself:

  1. This is usually only really an issue the first time (or first few times) FediFetcher is running. After that it's quick (mine currently takes under a minute per run on average).
  2. I don't think it'll make much difference: A major bottleneck (particularly during the first run) is the rate limits applied by your home instance. You'd just have all those threads spending a lot of time idling, waiting for your rate limit to reset.
  3. It would add a fair bit of complexity and/or additional dependencies for (as explained above) fairly little actual gain.

If someone can convince me otherwise, and/or can implement it without additional dependencies or a lot of complexity I'm happy to re-visit this stance though.