Closed jfkelley closed 1 year ago
That repo contains a peculiar mix of encodings.
Try git cat-file -p 3e9d6b6 > 3e9d6b6.txt
. The message and author name are apparently encoded as UTF-8. Despite the header claiming encoding GBK
, we just get garbage when opening the file as GBK.
In the same repo, commit 4d47b50 is even weirder (git cat-file -p 4d47b50 > 4d47b50.txt
). Once again the header says encoding GBK
. This time around, the commit message does appear to be correctly encoded as GBK. However, the author names are UTF-8, not GBK!
I don't know if such encoding mismatches are a common occurrence in Chinese repos, but one thing's for sure, we shouldn't crash when handling those.
Anyway, I'm suggesting PR #1210 to mitigate the crash.
Fixed with PR #1210
I happened across a particular commit (https://github.com/LJF2402901363/javaWeb-bookManagementSystem/commit/3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d) where accessing the author causes a crash:
That code exits with "Segmentation Fault". The encoding/decoding going on is beyond my understanding, but it would be nice if whatever is special with that commit just raised a regular recoverable error.