Possible performance issues after a recent update

nanos / FediFetcher

FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.

https://blog.thms.uk/fedifetcher?utm_source=github

MIT License

315 stars 234 forks source link

Possible performance issues after a recent update #156

Closed stefanbohacek closed 3 months ago

stefanbohacek commented 3 months ago

After a recent update (https://github.com/stefanbohacek/FediFetcher/commit/df484696d98626295c8569e8804ad2b597a41da3), the CPU and bandwidth usage of my Mastodon server have increased, see attached screenshot from DigitalOcean.

A screenshot of three graphs - server bandwidth, CPU usage, and disk I/O. Both bandwidth and CPU usage increased after an update on August 11.

Here's my config.json file:

{
  "server": "stefanbohacek.online",
  "home-timeline-length": 200,
  "max-followings": 80,
  "from-notifications": 1
}

I have not made any changes other than fetching the latest code as shown above.

The droplet has 2 virtual CPUs, 4 GB RAM, and a 50 GB SSD.

nanos commented 3 months ago

It looks like you have updated from 7.1.1. The only significant change I can think of here, is that in 7.1.2 we introduced improved caching of robots.txt files, which uses sha256 to hash domain names over and over again.

Can you try #157 which uses xxhash instead of sha256 and see if that improves things for you?

(Don't forget to run pip install -r requirements.txt)

stefanbohacek commented 3 months ago

@nanos Thank you! I should also add that I'm running FediFetcher as a GitHub Action. Either way, I pulled in the latest changes and will keep an eye on the server throughout the day.

nanos commented 3 months ago

Ah, in that case this won't make any difference whatsoever.

If you are running FediFetcher as GH Action, then what server are those graphs from?

stefanbohacek commented 3 months ago

Right. So that's from the actual stefanbohacek.online Mastodon server. I haven't made any changes to the server itself, and the time when the graph started to change matches exactly when I pulled in the update, so I figured maybe the frequency/amount of API calls might have changed? Would it be helpful to share any logs perhaps?

nanos commented 3 months ago

Oh, I see.

That, quite frankly, makes no sense whatsoever: Changes since 7.1.1 have only served to reduce the load on the actual mastodon server, rather than increase it, with FediFetcher increasingly making use of caching to avoid repeated requests for the same thing.

As such I'm tempted to conclude that this increase is very likely unrelated.

Did you see an uptick in Sidekiq jobs?

Did you make any other changes?

nanos commented 3 months ago

Hm. Looking through your GH Actions logs, there are a couple of things that stand out to me:

Your actions run for a very long time: FediFetcher takes < 1 min most of the time for me, with similar configuration
~~Your replied_toot_server_ids file is empty.~~ (This is normal with your configuration)

I wonder if these two are connected somehow?

nanos commented 3 months ago

Is it possible your mastodon server is just busier than usual? More notifications maybe?

I sometimes get this when I post something popular, and get dozens of likes: My FediFetcher run time explodes as it's backfilling all those profiles, and that of course has a knock-on effect on the mastodon instance.

stefanbohacek commented 3 months ago

Your actions run for a very long time

Good catch! Looking at https://github.com/stefanbohacek/FediFetcher/actions/workflows/get_context.yml?page=17, this strongly hints as having to do with the FediFetcher update.

9:24: 1m 52s 9:35: 1m 50s
9:45: 1m 51s 9:47: merged latest changes 9:55: 1h 54m 43s 10:15: 16m 29s 10:35: 8m 26s

Did you make any other changes?

Nope.

Is it possible your mastodon server is just busier than usual? More notifications maybe?

Not particularly.

Did you see an uptick in Sidekiq jobs?

I'll look into this a bit more. I also paused the workflow for now to see if this has any impact.

nanos commented 3 months ago

Good catch! Looking at https://github.com/stefanbohacek/FediFetcher/actions/workflows/get_context.yml?page=17, this strongly hints as having to do with the FediFetcher update.

9:24: 1m 52s 9:35: 1m 50s 9:45: 1m 51s 9:47: merged latest changes 9:55: 1h 54m 43s 10:15: 16m 29s 10:35: 8m 26s

You can't really compare these: The 09:45 one (and all previous runs) error out, so the run time is simply not the run time of a complete run.

I cannot find any previous successful run in your log, so I cannot find a comparison.

I must admit that at the moment I cannot think of any recent changes that could've caused this.

stefanbohacek commented 3 months ago

The 09:45 one (and all previous runs) error out

Ah, sorry, I missed that.

Either way, still keeping an eye on the server, but so far things seem to be calming down after pausing the workflow.

A screenshot of three charts from DigitalOcean showing a decrease in CPU usage, server load, and memory usage over the past 30 minutes or so.

I'll try to poke around the logs a bit more and see if there's anything else going on.

nanos commented 3 months ago

Yeah, it doesn’t surprise me to see that it’s calmed down. With backfilling that many posts it’ll have an impact.

I’m just not sure why it would backfill that much. In a cursory glance I couldn’t see any obvious duplication, hence asking earlier whether your server was just very busy.

You could try to turn off from-notification, backfill-with-context, and/or backfill-mentioned-users. These three will be having by far the biggest impact (particularly backfill-with-context) but obviously they’ll also mean you’re missing out on some functionality (see the readme for a description of each option). Up to you to decide whether it’s worth doing.

stefanbohacek commented 3 months ago

I think I'm going to close this ticket for now. You confirmed that there were no issues with the update, and gave me a few pointers to look at.

And looking at the past FediFetcher runs, I am wondering if it's the other way around, maybe this has always been an issue, but the recent update fixed whatever was causing the workflow to error out.

I will go over the settings and see if I can tweak them better. Thank you, I appreciate all your help!

nanos commented 3 months ago

OK, let me know if you need any more guidance at any time, or if you find out anything that would be helpful to share, please 👍

stefanbohacek commented 3 months ago

Just to follow up on this, adding a swap partition helped me a lot. Also, my updated settings:

{
  "server": "stefanbohacek.online",
  "home-timeline-length": 20,
  "backfill-with-context": 0,
  "max-followings": 80,
  "max-followers": 80,
  "max-follow-requests": 80,
  "max-bookmarks": 80,
  "from-notifications": 0
}

It might've been my home timeline taking up too much time to crawl. I'm switching to a more targeted approach with favorites and bookmarks.

Either way, things are looking good now!

https://github.com/stefanbohacek/FediFetcher/actions