libgit2 / pygit2

Python bindings for libgit2
https://www.pygit2.org/
Other
1.58k stars 382 forks source link

Segmentation Fault when accessing a particular commit author #1205

Closed jfkelley closed 1 year ago

jfkelley commented 1 year ago

I happened across a particular commit (https://github.com/LJF2402901363/javaWeb-bookManagementSystem/commit/3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d) where accessing the author causes a crash:

import pygit2
repo = pygit2.clone_repository('https://github.com/LJF2402901363/javaWeb-bookManagementSystem.git', '.')
commit = repo.get('3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d')
commit.author

That code exits with "Segmentation Fault". The encoding/decoding going on is beyond my understanding, but it would be nice if whatever is special with that commit just raised a regular recoverable error.

jorio commented 1 year ago

That repo contains a peculiar mix of encodings.

Try git cat-file -p 3e9d6b6 > 3e9d6b6.txt. The message and author name are apparently encoded as UTF-8. Despite the header claiming encoding GBK, we just get garbage when opening the file as GBK.

In the same repo, commit 4d47b50 is even weirder (git cat-file -p 4d47b50 > 4d47b50.txt). Once again the header says encoding GBK. This time around, the commit message does appear to be correctly encoded as GBK. However, the author names are UTF-8, not GBK!

I don't know if such encoding mismatches are a common occurrence in Chinese repos, but one thing's for sure, we shouldn't crash when handling those.

Anyway, I'm suggesting PR #1210 to mitigate the crash.

jdavid commented 1 year ago

Fixed with PR #1210