Occasionally high resource usage - Githubissues

phindmarsh commented 12 years ago

I'm not sure if this is related to this plugin particularly or not, I hope you will be able to provide some insight as to whether it could be the cause or not.

Recently we moved all our sites (we are a small web development shop) into git repositories while migrating to a new production hosting environment. There were a few sites (notably old ones) with lots of images and PDF documents that make the repository pretty large (excess of 400mb).

Since those repos have been added into our Redmine environment (now operating with Redmine 1.3-stable from SVN) occasionally the server becomes unresponsive. I have noticed that there are (up to 3) ruby threads consuming 100% of the CPU for minutes or more. Then just afterwards mysqld spikes for a ~10-20 seconds, then everything drops back to idle.

I assume this is the fetch_changesets task going through each repository looking for changes to each repo, then adding them to the database. It appears mysqld only spikes right at the end and occasionally git spikes but only for a second or so.

We have around 35 repositories, only a few of them are >100mb, and the changes table has approx 90,000 rows in it. Does this sound like something that could happen with the plugin? It only seemed to manifest itself after adding all the new repos, although the large ones were added at the same time as the others so I can't separate if its a 'number of repositories' over a 'repository size' problem.

Thanks for taking a look, we love this plugin and it certainly does make our workflow and server management significantly easier, as always your work is much appreciated.

kubitron commented 12 years ago

Actually, I suspect that this has to do with the way that Redmine handles change-sets for git. It is very labor intensive, writing two entries to the database for every git-level commit in the log (one new entry to the changes table and a modifying of the repository_extras entry of the repository table). This huge amount of traffic is why the generic version of Eric's plugin doesn't work well, since it is triggered for every write to the repository table. It led to my "performance" branch of this plugin (last year), which changed things to trigger the plugin once per fetch_changesets, rather than per write to the repository table.

Thus, my guess is that this is simply a Redmine issue. It depends on the number of new git-level commits introduced by fetch_changesets, rather than by the number of repositories. Perhaps the plugin makes it slightly worse, since it (the plugin) registers an observer on the repository table, leading to slightly more code per change to that table. I'm assuming that things work fine now that you are past the setup time?

kubitron commented 12 years ago

Incidentally, the "performance" branch is fully integrated into the master branch, so you have all of that code currently working for you...

phindmarsh commented 12 years ago

Actually, we were noticing the lag even days after the migration occurred. We haven't had it happen again for a day or so, but I'll keep an eye on it and see if it pops up again. The times we noticed it we certainly weren't making changes (committing etc) to any repository, which first lead me to check we didn't have any cron jobs running that could be causing it (which we didn't).

Thanks for the insight, I'll keep an eye on it.

kubitron commented 12 years ago

What is a bit weird is the three threads with 100%. How frequently are you running fetch_changesets? If you run it too frequently, perhaps you have more than one running at a time...? (I wouldn't do so more frequently than 15 minutes or so). You should be able to see the total time for fetch_changesets operations in the log -- look for the fetch_changesets header on a log entry, then look at the end of the "paragraph" to see how long it took...

You should probably upgrade to version 0.4.5x of this plugin, incidentally...

phindmarsh commented 12 years ago

There was three threads totaling 100% (roughly 33% each).

AFAIK we do not run fetch_changesets manually (or via cron etc), so I imagine it only runs when instigated via the UI?

kubitron commented 12 years ago

Ah. You should try running fetch_changesets regularly (via a cron job). Is it possible that there were changes that had built up? One other benefit of running fetch_changsets regularly is that it tends to correct errors that might have built up (although that doesn't seem to be giving you a problem).

The plugin does attempt to "cache" its interactions with git. I'm not sure how many repos can be cached at a time. Perhaps you have folks looking at multiple repositories in rapid succession, causing a bunch of cache traffic to the database (cache entries are stored in the DB). Perhaps you could see if you could stimulate the problem by looking at a bunch of different repos in rapid succession (from different browsers?).

If you remember what date/time this problem was in, you should look in the log (i.e.

REDMINE_ROOT/log/production.log

Can you figure out what was happening then?

phindmarsh commented 12 years ago

Ok I'll get a cron job started.

There is only two of us using the repositories at any one time, and on a few instances we were coming back to Redmine from a period of inactivity. When we loaded the site it would take ages to load, if I SSH'd into the box (which would be slower than normal) and use top I could see the processes having a good time.

I did check the log when it was happening and couldn't see any activity from when it was being slow, but I'll take another look.

kubitron commented 12 years ago

Some clarification: when you say that the site took "ages to load", do you mean the whole Redmine site, or just the Repository page? (Different causes).

Are you using Apache+FCGID? That will timeout old threads after long periods of inactivity; they take time to restart (can be 10 seconds or more). The fetch_changesets running through the web (need to set a key) will keep at least one thread alive.

phindmarsh commented 12 years ago

Apologies, the entire site.

BUT I think it was any service that was running on that server, not just redmine (i.e. SSH took a approx 30 seconds to log in as opposed to the typical 2 seconds). This leads me to thing its just out of resources to cater for the additional requests once whatever process it is starts hogging the CPU.

We have Apache+Passenger to Redmine 1.3-stable with the latest copy of this plugin.

kubitron commented 12 years ago

So -- perhaps it needs a reboot? Are you thinking that you can close this issue? Or, do you want to keep it open?

phindmarsh commented 12 years ago

Yeah I rebooted it a few times when it was misbehaving with no effect.

I'll close the issue, I was just looking for some insight into how the plugin does things when it comes to large/many repositories.

I'll keep and eye on it and if it happens again I'll re-open the issue.

Thanks for your help!

kubitron commented 12 years ago

Ok. I would try to run fetch_changesets from a cron job (say through the web) and look in the log to see how long it takes to process. That would be an interesting insight to how the plugin is doing (as opposed to Redmine in general).

You would have the cron job execute something like:

*/15 * * * * apache curl -k -o/dev/null -s "https://yourserver.com/redmine_root/sys/fetch_changesets?key=xxxxxx"

Here, the key is the API key under general redmine settings (with the repository tab). It is there to avoid a denial of service attack.... (the above runs the command as user apache, executes a http request to the given URL....). You can also execute such a command from a browser to experiment with fetch_changesets as well (I do that a lot with a tail -f on the redmine log file).

phindmarsh commented 12 years ago

Is that the same as doing it via a rake task? I have a cron running every 15 minutes that does:

rake redmine_git_hosting:fetch_changesets RAILS_ENV=production

kubitron commented 12 years ago

Should be the same. With the rake task, it is absolutely important that you run as the same user that redmine normally uses!

kubitron commented 12 years ago

The rake task will be labeled in the log file with the fact that you are executing "from the command line". The one thing that the rake task won't give you is timing in the log file... (Redmine puts a line at the end of each request in the log to say how long it took).

phindmarsh commented 12 years ago

Certainly, the cron specifies www-data as the operating user and I've checked it does in fact run in that manner.

Thats a good point though, I'll change it to use curl.

kubitron / redmine_git_hosting

Occasionally high resource usage #43