Update to 1.7.1 - Githubissues

carlosmn commented 1 year ago

Unfortunately we segfault but I'm not sure if this is something in rugged or libgit2 or elsewhere. We still segfault even without the custom allocator setup so presumably it's not something inside rugged.

Deeper testing would have to go through building a custom ruby that uses the system allocator so valgrind or gdb can see what is going on. Running a single test doesn't trigger this so it seems to be something larger, and it fails with a NULL pointer dereference somewhere deep in fileutils during the helper, which is quite odd.

carlosmn commented 1 year ago

Well some gdb paired with printf debugging shows that we seem to be double-deallocating the grafts. It looks like this might be due to git_repository__cleanup not dealing well with getting called twice wrt grafts. In rugged we call git_repository__cleanup in Repository::close which you can call to free up resources before the GC gets to them.

dvzrv commented 1 year ago

Hi! Since we're currently blocked by ruby-rugged in our libgit2 1.7.0 rebuild on Arch Linux: Do you know if applying https://github.com/libgit2/libgit2/commit/9d4c550564ee254dda9e2620c4c1e32ebb529728 on top of libgit2 1.7.0 is enough to make your changes to rugged work?

FWIW: applying a backported version to 1.6.3 seems to work fine for my use-case!

carlosmn commented 1 year ago

@dvzrv it was enough to get it to work on my machine, and updating to 1.7.1 it does work, which shouldn't have anything else regarding segfaults.

Unfortunately now that 1.7.1 is out and I'm trying to update it, all of the macOS stuff fails but I don't know how much even has to do with libgit2 vs how it's built on macOS. And I'm not where my mac machine is so I can't dig deeper at the moment.

ethomson commented 1 year ago

@carlosmn I have a Mac, how can I help?

carlosmn commented 1 year ago

I'm now close to my mac, but it's packed up somewhere... basically in CI on macOS all the ssh tests fail for macOS and I've no idea if it's a macOS issue or if there's something that changed with GitHub or what.

carlosmn commented 1 year ago

Annoyingly it looks like the failing tests are those that test against the local ssh server. So that's more annoying to set up but at least it probably means it's something about how we set it up (but I only have old machines so I don't know how much I can update).

EDIT: it actually does fail locally for me on a mac but with a different error message, so I don't know if it's a different way of it to present a similar error or if it is a different issue on a different version.

carlosmn commented 1 year ago

The problems I had locally ended up being just because we don't actually kill sshd so it was looking at the wrong checksum. The tests run fine on my now-up-to-date machine that's a patch version away from the runner.

So it looks like the issue might be something related to how the runners work on Actions and I've no idea how to debug that.

carlosmn commented 1 year ago

Well as it happens, updating libssh2 to 1.11.0 from homebrew makes it fail locally like it does in CI. So maybe this should get merged as it doesn't seem like it's something that rugged has changed.

carlosmn commented 1 year ago

In the logs from sshd it looks like there's too many auth failures, and it's tried none and publickey repeatedly. I don't know if we really now need to use a different scheme from the deprecated RSA or if it's something else.

libgit2 / rugged

Update to 1.7.1 #964