msysgit / git

msysGit-based Git for Windows 1.x is now superseded by Git for Windows 2.x
http://github.com/git-for-windows/git
Other
1k stars 316 forks source link

git gui No working directory #302

Closed cloudchou closed 9 years ago

cloudchou commented 9 years ago

Git init a directory with name "新建文件夹", Then open git gui tool, it will tell No working directory image But if with name "新建文件", Then it's Okay. It's really strange

cloudchou commented 9 years ago

git-gui.tcl use command "git rev-parse --show-toplevel" to find current working directory. so it fails. image But if we use command "git rev-parse --show-toplevel" in git bash. there is no problem. image

dscho commented 9 years ago

From your screenshot (please copy-paste text whenever possible), it appears as if the is a problem: the error in Git GUI misinterprets it, and the Bash output skips it.

Or maybe it does not really skip it: the prompt also does not show it, which makes me believe that this is a different working directory than the one specified at the very beginning of this bug report.

Also, looking at the exact error message, it looks as if not the git rev-parse --show-toplevel call failed, but instead changing the working directory to the value returned by the rev-parse call.

Maybe the core.worktree config setting points to a non-existing directory?

cloudchou commented 9 years ago

The two comments are two tests。 Maybe some special chinese charaters will cause this problem. there is another tests: Git init a directory with name "技术文档", Then open git gui tool, it will tell No working directory. qq 20150115223515 But if we use command "git rev-parse --show-toplevel" in git bash. there is no problem.

Cloud@CLOUD-PC /E/git/技术文档 (master) $ git rev-parse --show-toplevel e:/git/技术文档 Cloud@CLOUD-PC /E/git/技术文档 (master) $

Maybe it's an encoding problem. But if we change git-gui.tcl code line 1299 to : set _gitworktree [pwd] It will be okay. so is there any problem with "git rev-parse --show-toplevel" ?

cloudchou commented 9 years ago

I find the bug's reason. git-gui.tcl have a proc called git. It have a line of code : set result [encoding convertfrom utf-8 [encoding convertto $result]] it will encode string from utf-8 to unicode then to system code. my system's codpage is cp936. '技术文档' 's utf-8 code is E68A80 E69C3F E69687 E6A1A3. change from utf-8 to cp936. it will be : BCBC 3F3F3F CEC4 B5B5. so 术 encode error. it need to be CEC4. Maybe tcl's encoding is not perfect. it can not encode some charaters.

dscho commented 9 years ago

@cloudchou great that you figured it out! @patthoyts any idea what's the best way to get this sorted out in Tcl?

cloudchou commented 9 years ago

I have checked that the unicode of '术' in cp936.enc is correct . '术''s cp936 code is CAF5. so we have to see Page: CA , Position: F5. It's 672F. So it seems there is no problem with cp936.enc. Maybe the implemention of encode has problem.

patthoyts commented 9 years ago

I can't identify the line of git-gui code you are referring to. Please include the version number of git-gui and of tcl (they are given on the Help, About dialog).

[encoding convertto $result] will take the bytes in $result and convert them from the system code page into unicode. If the bash shell is not using the system codepage this will result in a misencoding. So we need to know what codepage your shell is using and what Tcl thinks the system encoding is. If you hit Control-F2 in git-gui it will show the Tk console on windows and entering encoding system will report the system encoding name.

On my English Windows system, encoding system returns cp1252 and entering getcp in the bash prompt also returns 1252. It may be that git-gui needs to select the encoding in use by bash in some cases if it can be set to be other than the system encoding. Although getcp is specific to Git for Windows.

As a side note: when reporting encoding problems it is always useful to refer to the glyphs by their unicode identities. For instance the first one mentioned '夹' is U+5939 and from that we can find lots of information about the glyph (eg: http://codepoints.net/U+5939). All characters in the Basic Multilingual Plane should be supported by Tcl and Tk.

patthoyts commented 9 years ago

It appears the line of code mentioned was added to msysGit's tree in commit b7d7da5763dcfa286d0455f2b7f0b93f268224c0 "Unicode file name support (gitk and git-gui)" which is why it's not found in git-gui's repository.

The change there looks correct to me provided the encoding being used by the git executable really is the same as the system encoding. Creating an alias using git config --global alias.encoding "!getcp" is one way to try and find the encoding in use. If you declare that alias then run git encoding from the git-gui console as described previously it should print the git encoding.

cloudchou commented 9 years ago

encoding system returns cp936 and entering getcp in the bash prompt also return 936. qq 20150122173259

cloudchou commented 9 years ago

I have try : git config --global alias.encoding "!getcp". But the problem is still the same.

cloudchou commented 9 years ago

If we open Tk console from Git gui. Then type command encoding system. there will be an error dialog. qq 20150122174012 The excpetion : error flushing "stdout": broken pipe error flushing "stdout": broken pipe while executing "flush stdout" (procedure "ConsolePrompt" line 20) invoked from within "ConsolePrompt" (procedure "tk::ConsoleInvoke" line 23) invoked from within "tk::ConsoleInvoke" (command bound to event)

cloudchou commented 9 years ago

@dscho @patthoyts Now I find the reason why git gui tool will tell No working directory. The output string of command "git-rev-parse.exe --show-toplevel" is encoded by utf8. so if we execute the command "git-rev-parse.exe --show-toplevel" at "e:/git/技术". The hex code of command's output string will be: 453a2f6769742fe68a80e69caf so 技术's hex code is e68a80e69caf. it's a utf8 string.

There are lines of code in git-gui.tcl :

proc git {args} {
  ...
    set result [eval exec $opt $cmdp $args]
    if {[encoding system] != "utf-8"} {
        set result [encoding convertfrom utf-8 [encoding convertto $result]]
    }
    ...
    return $result
}

[eval exec $opt $cmdp $args] will open an console to execute command. Command's output string will be treated as a system code page encoded string. My sytem's code page is cp936. [eval exec $opt $cmdp $args] first encode the output string to unicode with the rule cp936 to unicode. Then encode the unicode string to utf8 string with the rule unicode to utf8.

But the real encode of command's output string is utf8. If we treat it as cp936, then convert it to unicode string with the rule cp936 to unicode, errors will happen. 技术's hex code is e68a80e69caf. Now we treat it as cp936, then convert it to unicode string with the rule cp936 to unicode. e68a will be converted to 93b6. 80 will be converted to 20ac. e69c will be converted to 93c8. af will be convert error. Because cp936 is a multibyte encode. af need one more byte to convert. so after convert, 技术's hex code will be 93b620ac93c800. Then convert 技术's unicode string to utf8 string. Hex code will be e98eb6e282ace98f88af. So after execute the following command: set result [eval exec $opt $cmdp $args] the result's hex code is 453a2f6769742fe98eb6e282ace98f88af So convert order is: cp936(real encode is utf8) -> unicode -> utf8

There are lines of code in proc git

    if {[encoding system] != "utf-8"} {
        set result [encoding convertfrom utf-8 [encoding convertto $result]]
    }

because my system's encode is cp936. so it will execute following script.

set result [encoding convertfrom utf-8 [encoding convertto $result]]

First convert the result to cp936 string with the rule unicode to cp936. Then convert the result to unicode string with the rule utf-8 to unicode. Convert order : unicode -> cp936(utf8) -> unicode. After convert to cp936, the hex code of 技术's cp936 string is BCBC 3F3F3F. It should be BCBC CAF5. After convert to unicode, there are more errors. So the essential reason is that [eval exec $opt $cmdp $args] shouldn't always think command's output string is encoded by system code page.

cloudchou commented 9 years ago

@dscho @patthoyts

solution:

We can change the following command's output string to a system page encoded string. git-rev-parse.exe --show-toplevel

so we should modify the code in rev-parse.c:

...
#include "split-index.h"
#include "utf8.h"
#include "winnt.h"
...
int cmd_rev_parse(int argc, const char **argv, const char *prefix)
{
  if (!strcmp(arg, "--show-toplevel")) {
                const char *work_tree = get_git_work_tree();
                if (work_tree){
                    sprintf(system_cp, "cp%d", GetACP());
                    work_tree = reencode_string(work_tree,system_cp,"UTF-8");
                    puts(work_tree);
                }
                continue;
  }
}

Then modify the code in git-gui.tcl(git/git-gui/git-gui.sh).

proc git {args} {
    set opt [list] 
    while {1} {
        switch -- [lindex $args 0] {
        --nice {
            _lappend_nice opt
        }

        default {
            break
        }

        }

        set args [lrange $args 1 end]
    }
    set cmdname  [lindex $args 0]
    set cmdp [_git_cmd $cmdname]
    set args [lrange $args 1 end]

    _trace_exec [concat $opt $cmdp $args]
    set result [eval exec $opt $cmdp $args] 
    if {[encoding system] != "utf-8"} {   
      if { ! ($cmdname == "rev-parse" && [lindex $args 0] == "--show-toplevel") }  {        
        set result [encoding convertfrom utf-8 [encoding convertto $result]]
      } 
    }
    if {$::_trace} {
        puts stderr "< $result"
    }
    return $result
}
kblees commented 9 years ago

Your solution breaks git rev-parse in the console. All git output (except file content) is expected to be UTF-8, you shouldn't change that.

The problem seems to be that the cp936 -> unicode conversion implicitly done by exec is not reversible (i.e. [encoding convertto $result] does not produce the same UTF-8 byte stream originally printed by git rev-parse).

The proper solution would be to use the stream-based variants of running git instead of exec, something like:

set fd [eval [list git_read] $args]
fconfigure $fd -translation binary -encoding utf-8
set result [string trimright [read $fd] "\n"]
close $fd
dscho commented 9 years ago

So I assume that things are fixed by our patches in Git for Windows 2.x. Correct?