Closed martinseener closed 6 years ago
I agree with this. I would love to give this a spin, but I'm not willing to switch to windows for it.
The announcement (and GitHub readme) emphasize "GVFS relies on a protocol extension that any service can implement" (https://github.com/Microsoft/gvfs/blob/master/Protocol.md). Any client tooling can support this as long as the protocol extension is added.
@nasserd That protocol extension is just for lazy fetching of objects. That is the minority of the work in this repo. The vfs allows file change tracking to prevent working tree scans. Both are valuable but just the protocol extension doesn't provide much.
+1 for non Windows support
Yes we definitely want to support Mac and Linux, and we are looking for people with file systems expertise on those platforms.
I guest it not working like File System, you need addition permission to load File System driver (like linux load kernel mod) or install FUSE , so , please just make git itself support the new gvfs protocol.
There be two kind of git serve. over http/https and ssh. it seems will take more and more working to do if need implement GVFS.
Dokan can serve as a conceptual basis on how a VFS can be be bridged between Windows and FUSE-capable systems. Also, with WSL evolving, it could conceivably be taken the opposite way, and have raw support for mounting FUSE-based filesystems directly within Windows. I for one would love to see that day.
GVFS.FltWrapper could be abstracted so that people could aubstitute it with their own implementation for their OSes, or make a portable tool to de-virtualize files without dealing with custom FS
We actually built our first internal version of GVFS on something that looks a lot like FUSE. The challenge is that performance is absolutely critical on the file system on which you're running your builds, and context switching from kernel to user mode for every bit of IO can never be as fast as just doing it down in the kernel. What GvFlt does for us is that we only do the kernel to user mode context switch the first time the file is opened, and after that it just becomes a normal NTFS file. I'll be very curious to see how we can replicate that on Mac and Linux, but I don't yet know enough about those systems. If anyone has ideas, I'd be very happy to discuss them!
How about using git-annex, which is already packaged for most GNU/Linux distributions? https://git-annex.branchable.com/
I took a look at porting the code to Mac with Mono.
I ran into a number of NuGet packages that I could not restore on the Mac, where can I get these from?
bash$ nuget restore
Unable to find version '1.9.4' of package 'ManagedEsent'.
Unable to find version '1.9.4' of package 'Microsoft.Database.Collections.Generic'.
Unable to find version '1.9.4' of package 'Microsoft.Database.Isam'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventRegister'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventSource'.
Unable to find version '1.1.28' of package 'Microsoft.Diagnostics.Tracing.EventSource.Redist'.
Unable to find version '1.0.0' of package 'StyleCop.Error.MSBuild'.
Unable to find version '4.7.54.0' of package 'StyleCop.MSBuild'.
Unable to find version '0.17131.2-preview' of package 'Microsoft.GVFS.GVFlt'.
Unable to find version '2.0.275-beta' of package 'CommandLineParser'.
Unable to find version '3.5.0' of package 'NUnitLite'.
@joudinet annex, LFS, and similar solutions help (though only partially in our case) with the size problem, but they do nothing to help with the issue that it takes a long time for git to operate on a large number of files. When you have 3.5 million files, a basic "git status" takes 8 minutes, and that's because it has to enumerate every single file, at the very least compare its timestamp with the index, and worst case open the file and calculate its hash. By virtualizing, we are tackling both of those problems.
@migueldeicaza All of those packages are available on nuget.org.
However, I don't think you can do just a straight port of this to Mac, because one of the key components is the GvFlt filter driver that only works on Windows. That's the key piece that we need to figure out how to develop on Mac in an efficient way.
Once I get this building, I will try the next step.
What I would do is plug the code from GVFS into FUSE on macOS, and that is the area that will need some code changes.
@migueldeicaza if you get this working on mac w/ mono is there any chance it will work on linux w/ mono?
Some other challenges to keep in mind (Mikayla is helping me get my packages sorted out). The Microsoft.Database.Isam
and some of the other Isam
libraries contain P/Invokes into native code that will not work on Unix.
@DwordPtr if we get a unix port, it should work with little effort on Linux
For those following at home:
Start by updating Xamarin Studio to the beta channel (you will need the upgraded Mono).
Then you will need a newer Nuget on your system:
$ curl https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -o nuget.exe; mono nuget.exe restore
With that, you can get your chainsaw and start cutting.
@migueldeicaza That sounds great, let me know if you have any questions about the code as you're going through it!
The filter driver of Windows => the fanotify
/inotify
on Linux? (On-Access Scanning)
fanotify: fscking all notification and file access system The definition of event:
Definition | Meaning |
---|---|
FAN_ACCESS | File was accessed |
FAN_MODIFY | File was modified |
FAN_CLOSE_WRITE | Writtable file closed |
FAN_CLOSE_NOWRITE | Unwrittable file closed |
FAN_OPEN | File was opened |
FAN_OPEN_PERM | File open in perm check |
FAN_ACCESS_PERM | File accessed in perm check |
I wonder how much looking at gitfs aka SlothFS could be of help for getting started with the port: "SlothFS is a FUSE filesystem that provides light-weight, lazily downloaded, read-only checkouts of manifest-based Git projects. It is intended for use with Android".
I'm heavily over-simplifying here, but there have been two /really hard/ problems to solve in the file system portions of GVFS. 1) Enabling writes and 2) Making the file system fast, ideally as fast as a local disk for the second access. A read-only FUSE filesystem doesn't solve either of those very well :-). We actually had GVFS up and running almost a year and a half ago using a FUSE-like solution, and it worked great if you just wanted to read files and have them download on demand. Everything we've done since then is to allow you to also do things like run a build on top of GVFS, modify any file you want, have "git status" and "git checkout" do the right thing but do it fast, etc.
I'd love to be proven wrong on this, but as we go to port this to Mac and Linux, I'd be very surprised if FUSE alone ends up being the correct solution. We had started with a similar solution for our Windows implementation, and what we learned is that there's just no way to make it fast enough if every single bit of IO has to transition from kernel mode to user mode. The way we've solved this on Windows is using the GvFlt driver, which only has to transition to user mode GVFS for the first file access, and after that lays it down on disk as a normal NTFS file, which enables your second access to be as fast as normal.
I still haven't gone very deep on this, but I'm currently thinking that the solution for Mac/Linux will potentially be some combination of a FUSE read-only filesystem, combined with OverlayFS, combined with some sort of on-disk caching. But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.
And I've also completely skipped talking about all the challenges of ensuring that git operations do the right things, even after you've made the file system writable and fast. That raises a whole set of other challenges.
Thanks for the writeup. I wonder if an extension to FUSE would be suited here. This way you can maintain the userspace implementation. It would provide something similar to what you described, on first access it could return a file handle to a file on a different filesystem, thereon in the kernel would proxy all read/write operations to that handle directly.
However it seems that this idea has been proposed before without too much buy in from the FUSE devs:
Although there was some interest each time and often citing that the performance was "good enough" or there weren't clear wins. So maybe with such a clear use case and some good benchmarks this could be supported.
@sanoursa wrote:
I'd love to be proven wrong on this, but as we go to port this to Mac and Linux, I'd be very surprised if FUSE alone ends up being the correct solution. We had started with a similar solution for our Windows implementation, and what we learned is that there's just no way to make it fast enough if every single bit of IO has to transition from kernel mode to user mode. The way we've solved this on Windows is using the GvFlt driver, which only has to transition to user mode GVFS for the first file access, and after that lays it down on disk as a normal NTFS file, which enables your second access to be as fast as normal.
I am the author of WinFsp which is a FUSE solution for Windows. What I have found is that a user mode file system that enables caching on Windows (i.e. uses the NTOS Cache Manager), can be almost as fast as NTFS. This is because the cache manager satisfies a lot of the I/O and the context switches are minimized. [The reason that NTFS is fast is because of the cache manager and not because disk accesses are fast; besides context switches are faster than disk accesses.]
I link to some performance tests that show that a user mode file system can be very fast: https://github.com/billziss-gh/winfsp/wiki/WinFsp-Performance-Testing. NTFS has a slight edge on cached reads/writes in these tests, but this is because WinFsp does not implement FastIO (yet).
If I was doing this (and I am tempted) I would actually start with a cross-platform FUSE implementation, so the whole git world can benefit. I would then port to Windows using the WinFsp-FUSE layer (or its native API for maximum Windows compatibility).
+1
We actually had GVFS up and running almost a year and a half ago using a FUSE-like solution, and it worked great if you just wanted to read files and have them download on demand. @sanoursa This might be still interesting work for few folks with read only patterns. icsfs is one example https://sourceforge.net/projects/icfs/files/ Another one could be something similar to OpenAFS with it's local file cache.
The filter driver of Windows => the fanotify/inotify on Linux? (On-Access Scanning)
That would probably be a bad design decision. I exceed inotify limits (upward of >1m) on a daily basis due to Software like Jetbrains *, Bazel, ...
But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.
Due to the above this seems the most reliable way.
I still haven't gone very deep on this, but I'm currently thinking that the solution for Mac/Linux will potentially be some combination of a FUSE read-only filesystem, combined with OverlayFS, combined with some sort of on-disk caching. But that perf requirement is a hard one, and it may drive us to build a custom driver for those platforms too.
Reading into OverlayFS it seems to be a good choice. Currently I would imaging 2 problems:
I think the problem that we will face on Linux is that git is really fast there. And any solution "feeling" not as swift will probably not be accepted.
Do we want to move the discussion somewhere else with actual topic replies? Like Reddit, Slack, ...
I think discussion makes more sense here.
On 22/08/17 10:14, Andreas Bergmeier wrote:
- OverlayFS seems to also return a file handle directly from one of the underlying FS. While this is great and all, it seems to force us to copy large files from the lower to the upper FS as soon as we modify the file in any way (even attributes). This /may/ be ReallySlow™️ and "waste" space.
In practice it turns out that files in source repositories are rarely modified, almost always entirely rewritten. I don't think this should be considered a blocker.
In practice it turns out that files in source repositories are rarely modified, almost always entirely rewritten.
The way git works, git objects are not diffs but instead full copies of the modified files, so in any case full file would need to be transfered.
On 22/08/17 10:31, Jesús Leganés-Combarro wrote:
The way git works, git objects are not diffs but instead full copies of the modified files, so in any case full file would need to be transfered.
I should have been more clear. I wasn't talking about the files in .git, but rather the working tree that the user is editing.
IIUC the tools modifying .git wouldn't have to go though the caching overlay filesystem in most cases. Although I guess that that depends on the implementation.
@gitster Do you have any opinion about these questions?
For those of you who are interested you may also want to watch Eden. It is a low-level FUSE file system that (I believe) supports sparse checkout for mercurial and git. It currently does not build, but my understanding is that the Facebook folks have great plans for it.
@billziss-gh Is there any further information about Eden? Like mentioned earlier - a pure FUSE fs probably will not scale.
Is there any further information about Eden?
My understanding is that they plan to open things up more as time passes. You may want to reach out to @ wez for further information.
a pure FUSE fs probably will not scale.
As I mentioned earlier in the thread context switches are slow, but not as slow as disk accesses. A user mode file system that uses the OS file cache (cache manager, page cache, etc.) can be fast. This is not idle speculation, but it is something that I have done successfully on Windows.
Just wanting to point out that Jonathan Tan from Google is working on adding a native sparse checkout feature to git which might render gvfs and others obsolete: https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/T/#u
@est31 that's the hope :-). We're trying to push as much of this functionality into core git as we can. GVFS is currently providing two main features: 1) partial clone + on demand object download (which could be done in a platform-agnostic way in core git) and 2) working directory virtualization and dynamic expansion of sparse-checkout, which requires a file system driver and must be platform-specific.
@abergmeier-dsfishlabs: we don't have any public news or timeline for Eden today. What I said in https://github.com/facebookexperimental/eden/issues/4#issuecomment-306616014 is still broadly true. The grand vision is that it will be cross platform (linux, macos, windows) but it will take a bit of time to get there for all systems. We're prioritizing supporting Mercurial as that is what we're using for our largest repositories.
I want to report that Jonathan Tan from Google are already releasing patch for Partial clone https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/T/#mef6995ce5dad36660fea0a2f3e2255276e40f623 https://public-inbox.org/git/20171102202052.58762-1-git@jeffhostetler.com/T/#m172864eb51e7241562e04f8197b87a853440b909 https://public-inbox.org/git/de392bf3-bd53-1c17-3a43-c2e1604cbd59@jeffhostetler.com/T/#m40585b2b5ee2108bf5eed5afb3f300b1d6c0bb8f https://public-inbox.org/git/20171102203129.59417-1-git@jeffhostetler.com/T/#m375771d0e72ceb4b77ee48d435790091c146d409
GVFS is currently providing two main features: [...] working directory virtualization and dynamic expansion of sparse-checkout
Great summary. For Linux I would think:
Should be fairly "easy" to implement no matter in FUSE or kernel. Perhaps only needs to overlay a fs with fake paths for promised objects. When accessing a faked path, a direct expansion may be needed to be triggered. Would assume that since it needs to access a fs probably way faster to implement this part as kernel fs (to get rid of context switches).
This probably can be split into
This probably is done best by a daemon process, which listens on a directory (on the kernel fs). This directory would then have a list of files, which trigger a fetch of promised objects.
So perhaps have working directory virtualization in the kernel and dynamic expansion in userspace? Or am I completely wrong?
What about integrating the working directory virtualization part directly in the kernel as this is not the only project which could profit from such features: https://lkml.org/lkml/2017/12/13/669
@darkdragon-001 @abergmeier @abergmeier-dsfishlabs is there any more work done to support linux and mac?
Please see https://blogs.msdn.microsoft.com/devops/2018/03/15/gvfs-for-mac/ for the latest
It looks like this is on GitHub's radar folks.
Ms developed it for their use case which was managing windows source on Windows machines.
It's like asking why so many Linux opensourced projects don't have Windows versions.
I would definitely suggest taking a look at WinFSP.
Also, maybe the easiest way to do this would be to use libfuse on linux systems.
Is there any intent to port GVFS over to Linux or macOS?