libgit2 / pygit2

Python bindings for libgit2
https://www.pygit2.org/
Other
1.58k stars 382 forks source link

utf error when listing branches #1206

Open lb-ronyeh opened 1 year ago

lb-ronyeh commented 1 year ago

Hi, when listing branches, we get

"/usr/local/lib/python3.8/site-packages/pygit2/repository.py", line 1526, in iter for branch_name in self._repository.listall_branches(self._flag): | for branch_name in self._repository.listall_branches(self._flag): UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 7-8: invalid continuation byte

is it possible to output in the error the invalid branch name? or have an option to skip the invalid ones?

long2ice commented 1 year ago

Same here, so how to fix it?

hramrach commented 2 months ago

Don't use python.

It's not designed to deal with strings in different encodings.

hramrach commented 2 months ago

There is raw_listall_branches(flag: BranchType = BranchType.LOCAL)→ list[bytes] which should make it possible to get any garbage there is, and get the error in your application as opposed to inside pygit2 where it's not handled.

hramrach commented 1 month ago
import tempfile
import pygit2
import subprocess
import shutil
import sys

print(f"python: {sys.version}")
print(f"libgit2: {pygit2.LIBGIT2_VERSION}")
print(f"pygit2: {pygit2.__version__}")

repodir = tempfile.mkdtemp()
repo = pygit2.init_repository(repodir, bare=True)

sig = pygit2.Signature('Test User', 'testuser@nowhere.net')

data = 'blah blah master'
tree = repo.TreeBuilder()
tree.insert('file', repo.create_blob(data.encode()), pygit2.GIT_FILEMODE_BLOB)

master_commit_oid = repo.create_commit('HEAD', sig, sig, 'master commit', tree.write(), [])

repo.lookup_branch('master').set_target(master_commit_oid)

data = 'blah blah feature'
tree = repo.TreeBuilder()
tree.insert('file', repo.create_blob(data.encode()), pygit2.GIT_FILEMODE_BLOB)

feature_commit_oid = repo.create_commit('HEAD', sig, sig, 'feature commit', tree.write(), [master_commit_oid])

subprocess.run([b'git', b'--git-dir', repodir.encode(), b'branch', b'feature\xc0\xc1', feature_commit_oid.hex.encode()])

for branch_name in repo.raw_listall_branches():
    print(branch_name)

try:
    for branch_name in repo.listall_branches():
        print(branch_name)
except Exception as e:
    print(e)

shutil.rmtree(repodir)
python: 3.11.8 (main, Feb 29 2024, 12:19:47) [GCC]
libgit2: 1.8.0
pygit2: 1.14.1
b'feature\xc0\xc1'
b'master'
'utf-8' codec can't decode byte 0xc0 in position 7: invalid start byte