kubitron / redmine_git_hosting

A ChiliProject/Redmine plugin which makes configuring your own git hosting easy.
78 stars 15 forks source link

ssh-keys disappearing #42

Closed simonsd closed 12 years ago

simonsd commented 12 years ago

Occasionally I'm seeing people complain about not being able to access repos. The keys are still in the database, the web interface shows the keys exist, yet they aren't in the keydir of either the main gitolite-admin repo nor the redmine clone thereof.

The problem is that I can't find a clear cause here. When it happens who knows how long it takes before someone actually notices? Scrolling a few thousand lines of logs not knowing what you're looking for is also a bit difficult.

As a workaround I can:

After this the keys get added to both keydirs again and all is well. Having to do this twice a week is however bothersome. Any idea on how I might be able to reproduce this?

kubitron commented 12 years ago

You shouldn't need to do anything quite so complex to recreate everything. Have you tried executing a fetch_changesets operation? (or update_repositories as a rake task). Either or these should recreate everything and resync problems....

Also, are you using the absolutely latest code (just to know what is up)?

simonsd commented 12 years ago

Oh right, forgot that existed. Next time it occurs I'll see if that fixes the issue. I'll also consider putting in a cron job afterwards to make sure everything is ok. I believe you already suggested that to me once, figured I'd first see what the implications were and then forgot about it.

We are not using the latest commits, last commit I have is SHA: 0758cbb026d980f5baa47b51a03b4c70a9050414 I'll consider upgrading if the chances of breaking something are low.

TIA for the help

kubitron commented 12 years ago

You might want to upgrade. Several things have been fixed/improved recently. Shouldn't change much, except you need to make sure to execute:

rake migrate_plugins RAILS_ENV=production

Executing a fetch_changesets operation regularly should prevent the type of problem you are experiencing (by continually fixing up the conf file and keydir).

However, I would still like to figure out what happened. Were all the keys missing from the keydir? Also, were they missing from the conf file? If you look into the log file, the removal of keys from the keydir are logged of the form:

Removing gitolite key: redmine_testuser_1326264707_259792.pub

Have you looked in the log to find out where the keys were removed? What else is happening around it? (or, if those types of statements are not in the log anywhere, that would be interesting as well)...

simonsd commented 12 years ago

Not all the keys were missing from the keydir, a few were still present. I am indeed seeing some messages stating 'Removing gitolite key: $user.pub', but not as many as I would expect. EDIT: on further inspection, these are from a user playing around with his ssh-keys due to this issue. So that means no messages are logged when the keys are deleted. What I find strangest is the fact that it only deletes part of the keys, rather than all of them.

kubitron commented 12 years ago

Sorry for the delay in response, got busy. A question: when the user played around with his keys (delete and re-add), did it fix things for him? (it should have). Also, when you deleted the repo and re-constructed it, did you see a bunch of messages about keys being added back?

Also -- since the administrative repo is an actual git repo, you should be able to go back in the history and figure out when the keys disappeared -- what time it was, what the commit message is at the time, and -- perhaps more important, what was happening in the Redmine log at the time. Anything interesting there? Note that there are a couple of "normal" messages in the git log for the gitolite-admin repo from my plugin. They are of the form "Updated by Redmine" and "Updated by Redmine: Corrections discovered during RESYNC_ALL". I'm curious whether there are other things in the log from a third-party...

kubitron commented 12 years ago

Ah! I think I know what is happening. The keydir files in /tmp/redmine_git_hosting/xxx/gitolite-admin are being toasted by cleanup cron jobs if they are not accessed frequently enough. In particular, keys that have not changed in a long time. Then, the next attempt to change keys does a "git pull" which merges the current gitolite admin repo into the version in /tmp, which deletes a bunch of keys (including the admin key). Only the project being modified is fixed. Now we have a keydir with many keys gone, including the admin key. Next time around, my automatic repair code fixes the admin key, so everything appears to keep working, except that you are missing keys.

Note that regular cron jobs doing fetch_changesets (or update_repositories) would fix this problem. I discovered a case in my logs where this failure mode occurred, but got fixed at the next "fetch_changesets" (actually, it took two of them, first one added in all the user keys, second one fixed the admin key). I think that I need to make the code ignore any merges that result in changes; the reason the merge code was there at all was to (1) recover from errors last time around that didn't get properly committed and (2) be faster than a complete delete and clone. Perhaps I will let (1) be covered by regular fetch_changesets, and reclone whenever a merge results in something that is not on the

Does this scenario seem plausible to you? (RedHat has something called "tmpwatch" that is run regularly to delete idle files in /tmp).

kubitron commented 12 years ago

I have verified that partial deletion of files in /tmp/redmine_git_hosting can cause loss of keys. I have uploaded a patch to fix this. Want to try?

Since the master is a bit ahead of your installation, make sure to execute:

rake db:migrate_plugins RAILS_ENV=production

Because there are new options in the settings page. With luck, you shouldn't see this problem again (although you will see an error in the log to indicate that it probably happened and was fixed. Error will look like):

Processing ProjectsController#create (for 128.32.168.227 at 2012-04-23 15:06:23) [POST]
  Parameters: {"controller"=>"projects", "action"=>"create", "project"=>{"identifier"=>"this-is-a-test", "description"=>"", "parent_id"=>"", "name"=>"Testing", "homepage"=>"", "enabled_module_names"=>["issue_tracking", "time_tracking", "news", "documents", "files", "wiki", "repository", "boards", "calendar", "gantt", ""], "is_public"=>"0", "tracker_ids"=>["1", "2", "3", ""]}, "authenticity_token"=>"cJObbV+j9HWCEJCqyW8FqryW2/It8tmfPgJfvRBoeSc=", "continue"=>"Create and continue"}
Redirected to https://tessellation.cs.berkeley.edu/test-site/redmine/projects/new?
Fetching changes from gitolite-admin repository to /tmp/redmine_git_hosting/git-tess/gitolite-admin
***> Seems to be unmerged changes!  Going to delete and reclone for safety.
***> May need to execute RESYNC_ALL to fix whatever caused these pending changes.
Cloning gitolite-admin repository to /tmp/redmine_git_hosting/git-tess/gitolite-admin
Our hook is already installed
Creating new entry 'this-is-a-test' in gitolite.conf
  Letting gitolite create empty repository: 'repositories/this-is-a-test.git'
Committing changes to gitolite-admin repository
Setting up hooks for project 'this-is-a-test' (in gitolite repository at 'repositories/this-is-a-test.git')
Completed in 2968ms (DB: 36) | 302 Found [https://tessellation.cs.berkeley.edu/test-site/redmine/projects]
kubitron commented 12 years ago

I'm going to close this issue. I think that we probably have it fixed. Incidentally, if you look at the keys that were added back in after your delete and recloning dance, you will probably see older ones (that haven't changed in a while). I suspect that it is your longest-running users that were affected?

simonsd commented 12 years ago

When the user played around with his keys, nothing changed. This is mostly due to the link being broken between the real gitolite-admin repo and the clone. Once the link is restored everything worked itself out again.

As you mentioned, when the gitolite-admin repo recovers, a bunch of keys are re-added. Confirmed this in the git log rather than the redmine log (easier on the eyes). Also, the first commit after recovery is always commented as 'start', and simply adds the gitolite admin key again. Other than that I'm only seeing 'Updated by Redmine' commit messages, so there is no interference from external sources.

The tmpwatch scenario does indeed seem like a plausible explanation. I confirmed that tmpwatch is present on the system and see the removal of keys that have been there longest according to the git log. Because of this, I presume that changing the path for the temp dir to something outside of /tmp should also do the trick?

I just updated the plugin to the latest version, to see if it works. I guess no news is good news in this case :) I'll try to remember this in a few weeks to confirm.

Thanks for the hard work again :)

kubitron commented 12 years ago

Hopefully this will fix the problem. Does this happen frequently enough that will be be able to confirm it went away? At any rate, I'm glad that I finally had enough confirmation to look at this further. There has always been this weird possibility in the background...

Just to make sure you are up-to-date, did you get the version in which README.mkd says "0.4.5x"? I have started taging releases and this seemed worth a new release. Check out the release notes link in the README as well.

Later, --KUBI--

kubitron commented 12 years ago

(p.s. You might look at the installation instructions in the README. They have been updated considerably.).

simonsd commented 12 years ago

Well it's happened a number of times now, looks like it's about every 2 weeks.

Just checked and I got the latest version alright. The README does seem to have changed quite a bit, though it should still be ok for our setup. Thanks for mentioning it though.

Btw, why the choice of putting the release notes on the wiki rather than inside the repo?

kubitron commented 12 years ago

One more thing -- if there is some reason that the new installation instructions (README) and settings has a conflict with your installation, let me know. It should be strictly upwardly compatible (as long as you do a "db:migrate_plugins" operation).

kubitron commented 12 years ago

Anything new? Does this problem seem to have gone away? (I realize that you said it happens every couple of weeks and that I may be jumping the gun here).

simonsd commented 12 years ago

Nothing so far, the keys are all still present now. I had one incident were the repo's were out of sync, even though i'm running fetch_changesets in a cron job now. But that's about it. So far so good I guess :)

kubitron commented 12 years ago

Um... What does "out of sync" mean? Is this a plugin issue?

simonsd commented 12 years ago

'out of sync' as in, the gitolite-admin repo and the redmine clone thereof not being in the same state. This would mean that the actual gitolite-admin repo is missing some commits that the redmine clone does have, which is of course not expected behavior.

kubitron commented 12 years ago

Was this behavior manifest solely in the Activity log, or in the actual repo page on Redmine? Note that Redmine (i.e. the plugin) doesn't actually clone the clone. What happens is that it gets log entries from the gitolite repo and caches them. The activity log depends on a couple of things, not the least of which that the receive hook from gitolite must call into Redmine with the right url to make this work; in turn, that URL depends on a key that is placed in the gitconfig options by the plugin....

Just to make sure -- your gitolite repo and Redmine are on the same physical server, right?

simonsd commented 12 years ago

Ok, haven't had any issues for the past weeks, until now. Out of the blue redmine decides to remove a single ssh-key, mine .. Small detail, I changed my username in our ldap server and was hence forced to also rename my account in redmine. Might that have somehow caused it?

When checking the git log I see this: Updated by Redmine: Corrections discovered during RESYNC_ALL

kubitron commented 12 years ago

This is a use case that I've never tested. What is the state now? Did the change initially work then get messed up by the fetch_changesets operation? (That is the RESYNC_ALL message).

Check in the Redmine log around the time that you see these "corrections" labeled in the log. There will be a corresponding message of the form "Committing changes to gitolite-admin repository." (the key word here is "changes"). Do you see messages about adding/removing keys? Can you find the spot in the log where the keyname changed?

You should probably start this as a new bug.

kubitron commented 12 years ago

Actually, I'm just going to reopen this existing bug.

simonsd commented 12 years ago

Initially nothing changed besides my username. Since the username is simply assiocated with an id, the id is used for most other parts of redmine. The key itself kept my old username, but it does of course still work. The RESYNC_ALL operation is indeed what messed things up.

I'll take a look at the logs once i get home.

kubitron commented 12 years ago

Can you tell me what you did, exactly, when you changed username? Since the username is used for the key, I'm not entirely sure why the key should keep your old username in it...

What is the current state of the gitolite.conf? Does it simply not have your key in it? Does another RESYNC_ALL restore the key? (i.e. fetch_changesets)?

kubitron commented 12 years ago

Also, can you send me the production.log chunk associated with (1) changing your username and (2) removing your key through RESYNC_ALL?

kubitron commented 12 years ago

Oops. I know what is wrong here. And -- it is because your key is still using the old username. I'm going to close this issue and post a new one. (Have to see if I can add you to the issue, however -- perhaps you will have to simply comment on it).

kubitron commented 12 years ago

Sorry for the delay. See if the patch at the head of master fixes your problem... (I've been traveling). More info on issue #54.

simonsd commented 12 years ago

Thanks, but i currently don't have an infra to test this on. In the mean while I've mostly solved the issue by recreating my ssh-key through the web-interface. Now it's mentioned by the right name.

simonsd commented 12 years ago

Also sorry for all the delays, I really should consolidate my email-accounts.

kubitron commented 12 years ago

No worries. I'm glad that you found that solution.

I'm going to close this out.