jedbrown / git-fat

Simple way to handle fat files without committing them to git, supports synchronization using rsync
BSD 2-Clause "Simplified" License
621 stars 137 forks source link

Cannot execute "git fat init" on Windows #42

Open drauch opened 10 years ago

drauch commented 10 years ago

After following the installation guide, I've tried to execute "git fat init" on my repository, however, I only get the following error:

user@pc ~/Desktop/git-fat/BareTestRepoClone1 (master)
$ git fat init
  File "c:\Users\user\Desktop\git-fat\git-fat-master\git-fat", line 526
    for path, sizes in sorted(pathsizes.items(), cmp=lambda (p1,s1),(p2,s2): cmp(max(s1),max(s2)), reverse=True):
                                                            ^
SyntaxError: invalid syntax

What's wrong? Do I have to use Python 2.x instead of 3.x?

drauch commented 10 years ago

Yup, works with Python 2.7.x. Unfortunately I'm no Python programmer, but probably somebody should look into this issue :-)

jedbrown commented 10 years ago

There are encoding issues on Windows due to conflicting choices made by Git and Python-3, so this is not currently supported.

jjardon commented 10 years ago

I have this problem in Linux too (Arch). In Arch python is python3, not python2.

Easy fix: #!/usr/bin/env python -> #!/usr/bin/env python2

bilderbuchi commented 10 years ago

I'm pretty sure that is a lambda syntax change, see here: "Using parentheses to unpack the arguments in a lambda is not allowed in Python3". Also, this PEP

jedbrown commented 10 years ago

The main stumbling block for Python-3 is unicode. Git stores paths unencoded and we can pass it directly to the file system on Linux, but Windows requires encoding to create a huge string that is later parsed. I'm not wild about an intrusive unicode change that won't work on Windows, so I've been delaying. I don't know what is recommended here because the Python community has chosen a convention distinctly different from Git. Being a Git tool that is only incidentally written in Python, I would rather use the Git conventions.

bilderbuchi commented 10 years ago

I have to admit I don't see where unicode comes into play in the present issue, but wouldn't a comparison function similar to cmp=lambda item1, item2: cmp(max(item1[1]),max(item2[1])) be an easy and py2/py3 compatible fix for this problem?

jedbrown commented 10 years ago

The unicode issue has to be resolved to support python3. The sorted call should just use key=lambda p,s: max(s), but it's just the tip of the iceberg.

bilderbuchi commented 10 years ago

The sorted call should just use key=lambda p,s: max(s), but it's just the tip of the iceberg.

yeah, I just realized that, was hunting for the minimum supported python version of git-fat to find out if key would be a viable alternative.

jedbrown commented 10 years ago

sorted appeared in python-2.4, including the key argument.

bilderbuchi commented 10 years ago

yes, I know. the question was: what is the minimum python version that git-fat expects/needs? btw, might be good to have a python version checker in the code that bails and prints an error message (or just a warning) if python3 is used, as long as it's not supported?

jedbrown commented 10 years ago

Christoph Buchner notifications@github.com writes:

yes, I know. the question was: what is the minimum python version that git-fat expects/needs?

Most of my testing is with 2.7. 2.6 has worked, but beware the Python issue12786 problem in issue #46 that may or may not bite you. (I intend to fix this, but a portable fix is not trivial due to significantly different semantics depending on the version and platform.) I haven't tested earlier versions.

btw, might be good to have a python version checker in the code that bails and prints an error message (or just a warning) if python3 is used, as long as it's not supported?

Done.

TimMensch commented 9 years ago

As of Python 3.4.1, the win32/Python 3 branch is entirely and profoundly broken. I spent most of an hour playing whack-a-mole with issues before I gave up.

In addition to the basic issues with 3.4.1, it also uses os.path.join to create Windows paths, but I'm using MSYS/MinGW, and so the backslashes actually break things. Would be better to use posixpath.join directly, since Windows has long supported using forward slashes in pretty much all paths.

Giving up on git-fat for now. I like the idea of a simpler Python-based git-media, but I don't really want to install multiple versions of Python on my system.

+1 to update to latest Python and actually support Windows. Until then, it doesn't work for me.

jedbrown commented 9 years ago

Thanks for the update, Tim. I have not used MS products in 20 years and find the various failure modes and Python POSIX-Windows inconsistencies confusing. Is anyone aware of a best practices guide that does not involve reading the fine print for every function to learn which ones behaves subtly yet profoundly different on Windows, knowing Windows well enough to recognize when the side-effects will be a problem, and adding conditionals to the code?

TimMensch commented 9 years ago

Hmm.... I don't know of such a guide. The basics, as I'm familiar with them, include:

Aside from that, mostly things Just Work, at least when you're using Python. At least in my experience (I just wrote a huge git tool for my last employer that worked on OS X with only a couple of minor tweaks). There do exist a few tools (Perforce, maybe?) that have trouble with forward-slashes, and if you're trying to complete paths from the CMD prompt, only backslash works, but nothing that you're doing in git-fat should have a problem with forward slash.

MOST of the problems I saw were buffer-vs-string issues in Python 3.4. Some places you were doing everything in buffers and then doing a string op on them, and it would error out. Other places you were sending a buffer AND a string to, for example, os.rename(), and it would complain that both parameters need to be the same.

So updating to the latest 3.4.x version of Python and making it run there should get you 95% of the way past the problems I saw.

One thing that did worry me is that sys.getfilesystemencoding() was returning "mbcs", i.e., multi-byte characters, which I think is the wrong thing.

Looking at this link, Python 3 is supposed to support UTF-8 file names, even on Windows, so I think converting all path names to and from UTF-8 is a better practice than sys.getfilesystemencoding(). Internally Windows uses UTF-16 for path names; assuming Python is doing the "right thing", it should be taking any UTF-8 string and just converting straight to UTF-16, using the "Unicode" APIs (if you don't know, it's two encodings for the same set of code points, so the conversion between the two is trivial). In fact, this link implies that Python 3.2+ is doing the right thing, so you should always just hand it UTF-8 (or otherwise Unicode) file names. (On Windows, the APIs that end in "W" are the wide-char, i.e., Unicode APIs. The ones that end in "A" are ASCII or mbcs APIs.)

So in short: Use UTF-8 encoding for file names and forward slash, and get it running on 3.4.x, and it should all Just Work.

Looking at what I did accomplish, I'd forgotten that I actually got things to the point where I could try to PUSH files, but then then rsync command was failing for other reasons. I have no idea whether the "pull" side works at all, for obvious reasons. Here's what I had: https://gist.github.com/TimMensch/86064e34d8c901dbb5c3

I guess what it still needs is a different back-end than calling command-line rsync. Probably out of scope for your project, but if you ever did get s3 working, that would be awesome.

jedbrown commented 9 years ago

Thanks for the run-down. With regard to file names, my first priority is to be exactly compatible with Git, which stores file names with unspecified encoding (the encoding must agree with ASCII for '.' and '/'). You can debate whether that was the best choice, but it's how Git works. This creates some impedance mismatch with Python (especially version 3, which wants to be strict about filename encodings).

TimMensch commented 9 years ago

UTF-8 agrees with ASCII for the full first 128 bytes, so it's safe to use for '.' and '/'. It's only above 128 that it diverges from any of the Windows code pages or mbcs encodings.

Tim

On 6/3/2015 3:55 PM, Jed Brown wrote:

Thanks for the run-down. With regard to file names, my first priority is to be exactly compatible with Git, which stores file names with unspecified encoding (the encoding must agree with ASCII for '.' and '/'). You can debate whether that was the best choice, but it's how Git works. This creates some impedance mismatch with Python (especially version 3, which wants to be strict about filename encodings).

— Reply to this email directly or view it on GitHub https://github.com/jedbrown/git-fat/issues/42#issuecomment-108626838.

jedbrown commented 9 years ago

Right, but I get path names in unspecified encoding from Git and I want to hand those bytes directly to the operating system without insisting on guessing the encoding so that I can convert them to UTF-8 or whatever. Python makes that difficult and Windows cannot accept argv[] without encoding it (and concatenating into a massive escaped string), so I think the only solution is to insist that paths never appear in command line arguments. Fortunately, Git plumbing supports this, albeit slightly unnaturally at times (and with some complication due to close_fds=True having completely different behavior on Windows).