jelmer / dulwich

Pure-Python Git implementation
https://www.dulwich.io/
Other
2.06k stars 395 forks source link

Unicode file name do not checkout correctly on windows #203

Open garyvdm opened 10 years ago

garyvdm commented 10 years ago

Steps to reproduce:

dulwich clone https://github.com/garyvdm/git_unicode_files.git
dir git_unicode_files

expected: 1 file named À (which is u'\u00c0') actual: the file is named À (which is u'\u00c3\u20ac')

the file name is what you get if you do u'\u00c0'.encode('utf8').decode('mbcs'). mbcs it the default filesystem charter encoding used on windows.

The git client handles this correctly. I'll take a look at their source code in the future to try figure out how they handle this.

garyvdm commented 9 years ago

This is what msysgit does: https://github.com/msysgit/git/commit/19d1e75d58d772329372d453ead964c813bbc6b6

jelmer commented 9 years ago

Has this been resolved?

jelmer commented 9 years ago

@garyvdm Has this been resolved?

garyvdm commented 9 years ago

No, not yet.

guyskk commented 9 years ago

I also encountered this problem. see https://github.com/FriendCode/gittle/issues/72

UnicodeDecodeError When filename is"article/python2编码问题.md" or has unicode char

dulwich/index.py(423) build_index_from_tree()
-> full_path = os.path.join(prefix, entry.path)
(Pdb) pp prefix
u'E:/work/py/kkblog/article_repo/\u54c8\u54c8\\guyskk\\webhooks_test'
(Pdb) pp entry.path
'article/python2\xe7\xbc\x96\xe7\xa0\x81\xe9\x97\xae\xe9\xa2\x98.md'
(Pdb) os.path.join(prefix, entry.path)
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 16: ordinal not in range(128)

My script:

# coding:utf-8

def pull_or_clone(dest, repo_url):

    from giturlparse import parse
    from gittle import Gittle
    import os
    p = parse(repo_url)
    user_repo_path = os.path.join(dest, p.owner, p.repo)
    if os.path.exists(user_repo_path):
        repo = Gittle(user_repo_path, origin_uri=repo_url)
        repo.pull()
    else:
        repo = Gittle.clone(repo_url, user_repo_path)

if __name__ == '__main__':
    dest = u"E:/work/py/kkblog/article_repo/哈哈"
    repo_url = u"https://github.com/guyskk/webhooks_test.git"
    pull_or_clone(dest, repo_url)
jelmer commented 4 years ago

It would be great if somebody could verify this still happens with Dulwich 0.20.3. The testsuite now passes on Windows, so if it still happens we can probably add a test & fix for it.