callegar commented 13 years ago

Currently sparkleshare inherits all the features/limitations/quirks of its backend,

Consequently, it does not sync empty dirs (issue 326 - https://github.com/hbons/SparkleShare/issues/326) and it does not sync directories named .git. Furthermore, I expect it to get confused if you have a file named .gitignore, or .gitattributes or to simply ingnore it (haven't tried).

Conversely, I think that sparkleshare should hide the backend specific characteristics.

Particularly, when the backend is git, it should

1) on the way from the client to the server, put a placeholder in empy dirs and remove it on the way from the server to the client 2) on the way from the client to the server rename all .git files to .git and remove the on the way back.

Most likely the above issues exist also for the mercurial backend, with sparkleshare being unable to deal with .hg directories (or whatever mercurial has).

In conclusion, any backend should probably be able to specify push and pull rules to hide the peculiarities it brings in:

E.g. (for git)

On empty directory:

PUSH create file placeholder_
PULL remove file placeholder_

On file/dir with name matching pattern .git*

PUSH rename file/dir to name_
PULL rename file/dir named name_ into name

callegar commented 13 years ago

Argh... the issue tracker has removed parts of my text.

on point 2) above you should have read

"on the way from the client to the server rename all .git files to .git-random-key and rename to remove the random-key on the way back."

Then at the end you should read

On file/dir with name matching pattern .git*

PUSH rename file/dir to name-randomkey
PULL rename file/dir named name-randomkey into name

wimh commented 13 years ago

see also #223

What I am personally interested in, is the reason why you want sync a git repo itself. It would give you two levels of revision control. Why would you want that?

callegar commented 13 years ago

Obviously, I am not very interested in the 2 levels of revision control, but on the sync.

Scenario: I work at 2 different offices in each having a desktop. Furthermore I have one laptop I usually work on and another laptop that I only use at conferences or on vacation, since it is very light but not powerful enough to do anything serious on it. Whatever machine I am on, I want to have all my most recent stuff to work on. I cannot rely on a disk exported via internet, since the laptop can often be disconnected.

Using git alone for this does not help. If one day you work on a machine on 5 different projects, you need to remember to push all of them before changing machine. Additionally, if some file in some working tree remains uncommited or unstashed it will not propagate.

Currently I use unison. I have an active documents folder that I sync to one of the desktops that is always online and that acts as a central repo. Each project managed by git that is in the active document folder gets synced with both its worktree and the history. Git does not complain for this 'abuse'. So far so good. However, everytime I start working on a different machine and (most important) every time I am ready to leave it I need to remember invoking unison. Since every now and then I forget, this can be a pain. Having an automatic synchronization (a la dropbox or wuala) would probably be an improvement.

Unfortunately, sparkleshare cannot (still) be used for the task. Truly, using it would give you two levels of revision control, but that would not be expensive. Provided that you do not repack (too often) in the git repos on the clients, what would end up in the sparkleshare git repo are objects that do not change (a git object can only appear not mutate), so you really do not waste space.

And actually, the double level of revision control can even be useful. In my current scenario (based on unison) if a hard disk breaks down and starts corrupting data, the corrupted data propagates. Unless I can see the problem before all the machines have received the corrupted data, I can experience data loss. With the double level of revision control, I can be safer on this side, particularly if I put the sparkleshare git repo on RAID1.

lakeman commented 13 years ago

Work around, use sparkle share to automatically commit / push / pull to your own branch or mirror. Then squash commits / rebase when your changes are complete. Wrapping a whole source control repository in another repository is not ideal. Neither is committing other container formats like zip files, but that's a gripe for another time...

callegar commented 13 years ago

Can you please provide better details?

What exactly do you mean with work around, use sparkle share to automatically commit / push / pull to your own branch or mirror? Should I set up a different sparkleshare repo for each project I have that is under git? How would I deal with projects that have many different branches so that each branch gets pushed to a corresponding branch on the other side?

If there is some workflow that I am currently overlooking, I would be interested in knowing more about it.

wrt the rest of the doc:

Wrapping a whole source control repository in another repository is not ideal. There was a time when having nested directories was considered not ideal :-) Apart from this, the only thing that looks critical to me (at least with the git backend) is pack files, that might easily get too big for git, but as you mention the issue is not different from what one might encounter with large PDFs, ZIP files (or any format based on that, such as openoffice, docx, jar files), large tar.gz archives, etc. Note that this includes almost all recent document formats for productivity suites. This is why I advocate that sparkleshare is given the ability to apply transformations on the files on the fly as they are trasfered to and from the server: 1) To avoid problems with files/dirs that are named in such a way that some specific backend does not like (e.g. .git) or to work around some specific backend peculiarities (e.g. git does not store empty dirs). 2) To split too big files in smaller chunks. One option is using a rolling checksum mechanism a la bup, but note that for some file types, recognizing and exploding them (as pristine-tar can do) could be a superior choice. IMHO, all this is quite important for two reasons: 1) One wants to hide the peculiarities of the backend. There is an obvious advantage if the user does not need to know if sparkleshare is using a git, mercurial, unison or sftp backend and does not need to modify his/her behavior accordingly 2) One wants to manage any file and particularly large ones. One of the typical usages of solutions like dropbox or wuala is sharing/collaborating-on/having-a-safe-store wrt large documents and photos, not little text files.

lakeman commented 13 years ago

Yes, basically.

Add more than one remote to your local git repo. Set the default upstream of the branch to your own server / mirror so that Sparkleshare will sync changes there.

Then when you want to push to the "real" source repo, rebase (or reset --soft, recommit) to squash all the automatic commits into a more sane history. And manually push this history to the "real" remote.

On Fri, Sep 16, 2011 at 5:20 PM, callegar reply@reply.github.com wrote:

Can you please provide better details?

What exactly do you mean with work around, use sparkle share to automatically commit / push / pull to your own branch or mirror? Should I set up a different sparkleshare repo for each project I have that is under git? How would I deal with projects that have many different branches so that each branch gets pushed to a corresponding branch on the other side?

If there is some workflow that I am currently overlooking, I would be interested in knowing more about it.

wrt the rest of the doc:

Wrapping a whole source control repository in another repository is not ideal. There was a time when having nested directories was considered not ideal :-) Apart from this, the only thing that looks critical to me (at least with the git backend) is pack files, that might easily get too big for git, but as you mention the issue is not different from what one might encounter with large PDFs, ZIP files (or any format based on that, such as openoffice, docx, jar files), large tar.gz archives, etc. Note that this includes almost all recent document formats for productivity suites. This is why I advocate that sparkleshare is given the ability to apply transformations on the files on the fly as they are trasfered to and from the server: 1) To avoid problems with files/dirs that are named in such a way that some specific backend does not like (e.g. .git) or to work around some specific backend peculiarities (e.g. git does not store empty dirs). 2) To split too big files in smaller chunks. One option is using a rolling checksum mechanism a la bup, but note that for some file types, recognizing and exploding them (as pristine-tar can do) could be a superior choice. IMHO, all this is quite important for two reasons: 1) One wants to hide the peculiarities of the backend. There is an obvious advantage if the user does not need to know if sparkleshare is using a git, mercurial, unison or sftp backend and does not need to modify his/her behavior accordingly 2) One wants to manage any file and particularly large ones. One of the typical usages of solutions like dropbox or wuala is sharing/collaborating-on/having-a-safe-store wrt large documents and photos, not little text files.

Reply to this email directly or view it on GitHub: https://github.com/hbons/SparkleShare/issues/335#issuecomment-2112628

dabrahams commented 13 years ago

I don't think I understand how @lakeman's answer addresses what I perceive to be the issue. I have a similar situation. Maybe it would help to have another example.

My Emacs configuration contains a great many files from different sources, some of which are Git repositories. I want to move from machine to machine and always see the same configuration, so I want my entire .emacs.d folder synced (recursively) via SparkleShare. Many subfolders of .emacs.d happen to be Git repos themselves, and (I presume, at least) that's not going to work any better than it does to check one Git repo into another one.

dabrahams commented 13 years ago

Interestingly, SS seems to create a Git submodule in such cases... which is fine if your submodule is up-to-date and checked in somewhere, but otherwise, yikes. Ah, but it doesn't add a submodule mapping for this project, so... yikes.

lakeman commented 13 years ago

You might be interested in this then; https://github.com/apenwarr/git-subtree

dabrahams commented 13 years ago

Well, that is very interesting, but I think for other reasons. I am using an emacs package manager that, among other things, clones a bunch of git repositories under my .emacs.d directory. I just want to capture and sync the state of my .emacs.d directory without having to jump through any hoops. That's why I was trying to use the now-defunct Hg backend, which wouldn't have had this particular problem (with git repos, anyway).

callegar commented 13 years ago

Behavior noticed by dabrahams is IMHO quite understandable. This is exactly what git does when you try adding a dir that includes a .git dir

Which brings us back to the initial point. Sparkleshare does not do anything to hide the peculiarities of its backends:

If you have the .git backend, you cannot have files named .git* in what you sync. If say, there was a backend named Foobar using foo and bar files for its own stuff, it would be forbidden to have files named foo and bar in what you sync.

IMHO Sparkleshare should conversely do its best to operate seamlessly whatever its backend, without imposing filename rules that the user might not even be aware of.

sferris commented 13 years ago

I was wondering if there were any developer thoughts regarding this? While I totally understand why it would ignore .git folders, it also really caught me off guard. I had the (perhaps false) assumption that to the user, this would look/work like any other filesystem and I wouldn't have to understand the implementation details. Anything I throw in, would come out the other end, intact.

Then I get to work and my revision controlled projects are empty folders. Again, I totally get it and I didn't lose anything in reality, but I also don't work on massive projects.. I don't always want to check in works in progress (and I'm unreliable to do so, which is probably the bigger problem ;D), so I was hoping this would scratch my itch. Give me a chance to revert experiments, without having to branch AND let me commit things that really should be committed.

I read up on 'git submodules' which sounds like a good solution, and I'll have to do some experimenting with this to see how sparkleshare handles these. I'm guessing it's not seemless though? Will SS (ever?) automatically clone submodules?

Anyway.. just curious. Awesome product either way!

wimh commented 13 years ago

I can share some of my thoughts.

It depends very much what you consider the goal of SparkleShare. It can be just a tool to sync any files. But it can also be seen as a gui around git or another backend. Everything SparkleShare does, would not be too hard to enter manually on the commandline. So if you can't run SparkleShare for any reason, you still can simply access the files, and push and update. (although other clients won't get a notification).

On the other hand, if Sparkleshare needs to rename files before a commit, and rename them back afterwards, things get complicated soon. It would be very awkward to update manually. Also, because it is more complicated, more things can go wrong if you do try it.

And from the technical point of view, I don't know if it is an good idea to have a git repo inside another git repo. The .git directory contains a kind of database, which is kept consistent with the git executable. I don't know any internals of git, but it would be difficult to automatically handle a merge conflict. But of course if you never update simultaneously at two clients, there should not be any problem. Users who use git, will probably understand that.

To summarize, I do see advantages of syncing git repo's inside a SparkleShare folder, but I am afraid if things get too complicated. At least when you use git as backend.

sferris commented 13 years ago

Hrm -- I hadn't considered conflict resolution of the .git folder. That certainly would make things messy. I'm not even sure I'd trust another back-end to handle that correctly for git. Though, it's probably a negligible risk since I am basically the sole developer. (so much for collaboration though!) I think I'll just have to leave it unmanaged under SS and add some Makefile rules to push/apply patches to the real repository. That should be a suitable solution for me.

Out of curiosity though, in my case, it's more of an organization thing. Keep all my projects in the same area: /some/path/projects -- if the goal was more a less a gui around git, how hard would it be to make the drop spots customizable per repo? For instance have a Projects sub-category under SparkleShare where I can consolidate all my code projects?

dabrahams commented 13 years ago

I think this is a red herring. These issues don't present a bigger risk for Git folders than they do for many other things you might store and sync with SS, e.g. databases, Mac sparse bundles, Application bundles and frameworks, virtual machine disk images, etc...

callegar commented 13 years ago

On 14/11/2011 22:17, wimh wrote:

I can share some of my thoughts.

It depends very much what you consider the goal of SparkleShare. It can be just a tool to sync any files. But it can also be seen as a gui around git or another backend. Everything SparkleShare does, would not be too hard to enter manually on the commandline. So if you can't run SparkleShare for any reason, you still can simply access the files, and push and update. (although other clients won't get a notification).

On the other hand, if Sparkleshare needs to rename files before a commit, and rename them back afterwards, things get complicated soon. It would be very awkward to update manually. Also, because it is more complicated, more things can go wrong if you do try it.

And from the technical point of view, I don't know if it is an good idea to have a git repo inside another git repo. The .git directory contains a kind of database, which is kept consistent with the git executable. I don't know any internals of git, but it would be difficult to automatically handle a merge conflict. But of course if you never update simultaneously at two clients, there should not be any problem. Users who use git, will probably understand that.

To summarize, I do see advantages of syncing git repo's inside a SparkleShare folder, but I am afraid if things get too complicated. At least when you use git as backend.

Reply to this email directly or view it on GitHub: https://github.com/hbons/SparkleShare/issues/335#issuecomment-2736703 I think that by definition there should be no conflict on the git object database. The names of the files in the git object store are hashes. You cannot have 2 different files with the same name, unless there is a hash clash. There can be conflicts on references, but these should be easily solvable. References are text files.

But this is not really the point IMHO. In fact, if you use sparkleshare with the git backend you can have mercurial/cvs/subversion repositories in it.

Sergio

sferris commented 13 years ago

I started thinking about this last night again, and I wondered if git couldn't be modified to make the '.git' folder a configurable option, if it wasn't already? That way sparkleshare could use say '.ss-git' and everyone else would be none the wiser and continue to use '.git'. Of course, the conflict resolution on '.git' would still need to be simple. (I'm new to git and have lots to learn) Perhaps that's opening another can of worms too..

Hrm.. would the server care? I suppose it would..

callegar commented 13 years ago

On 15/11/2011 18:29, sferris wrote:

I started thinking about this last night again, and I wondered if git couldn't be modified to make the '.git' folder a configurable option, if it wasn't already? That way sparkleshare could use say '.ss-git' and everyone else would be none the wiser and continue to use '.git'. Of course, the conflict resolution on '.git' would still need to be simple. (I'm new to git and have lots to learn) Perhaps that's opening another can of worms too..

Reply to this email directly or view it on GitHub: https://github.com/hbons/SparkleShare/issues/335#issuecomment-2748023 I agree that that this sounds like a good option. And it would be particularly good if other 'backends' could follow the same approach.

A couple of notes on it:

1) I think that the .git community needs to get some motivation for this change, particularly if the .git name happens to be hardwired in multiple parts of the code.

2) Also related to 1). Probably trying to propose the problem on the git mailing list can be a good starting point. They may actually even propose alternative schemes or ideas. I think that the git ML is a rather fruitful place for discussion and insight.

3) Probably it is not just the .git directory, but the whole .git prefix that should be configurable. I think of other files (.gitignore, .gitmodules) that could interfere with ss.

Sergio

sferris commented 13 years ago

I actually looked at the source code for git and got really excited to see that it's a define:

$ grep DEFAULT_GIT_DIR_ENVIRONMENT ~/src/git-1.7.7.3/cache.h

define DEFAULT_GIT_DIR_ENVIRONMENT ".git"

And then I got downright giddy to see the --git-dir option. I even tested a repo, renaming .git to .foo and was able to push it to a gitorious server without issue. Unfortunately, it still excluded a folder named '.git' when I got that far.. However, it had no problem adding .foo, even though it was the git-dir for that repo.. so, you could rename .git in your repositories and then dump them into sparkleshare. I would presume the fact that it ignored .git, with the --git-dir option set, might be considered a bug? I can't imagine you'd want to version control the git-dir of the working git repository, in the working repository.

Tomorrow, when I get to work, I'll see about adding a bug report to git (if not already there) and see what they say.

slmingol commented 12 years ago

This would be a nice improvement to SparkleShare. I just recently copied a directory that included a .git dir in it, a cloned copy of a git repo that includes some code examples which I wanted to keep with their pdf and have them all synced on my various systems. When I logged onto my laptop I couldn't figure out why so many "/usr/bin/git status --porcelain" processes were tanking my system (load of >50 and climbing). It took me a few minutes to realize it was related to SS and a few more to realize that it was due to the .git dir being in my SS dir. Luckily I found this thread. At a minimum SS should guard against this type of problem arising, or perhaps notify the user of this particular problem.

hbons commented 12 years ago

@slmingol do you have a log (or can you reproduce one?) by any chance?

slmingol commented 12 years ago

@hbons I have 2 logs. One where I add a folder that includes a .git directory and another where sometime later I've restarted SS. The 2nd one exhibits the problem that I described above, where SS keeps reporting the following msg. and the load keeps climbing:


SparkleShare stale pid file found, starting a new instance.
Starting SparkleShare... Done.
Identity added: /home/rogerdodger/.config/sparkleshare/sparkleshare.sparkleshare@jke.org.key (/home/rogerdodger/.config/sparkleshare/sparkleshare.sparkleshare@jke.org.key)
19:38:00 [Cmd] /usr/bin/git log -1 --format=%H
19:38:00 [Cmd] /usr/bin/git rev-list --reverse HEAD
19:38:00 [ListenerFactory] Issued new listener for 204.62.14.135
19:38:00 [ListenerIrc] Connecting to 204.62.14.135
19:38:00 [Cmd] /usr/bin/git status --porcelain
19:38:00 [Cmd] /usr/bin/git status --porcelain
...
...
19:38:00 [Listener] Connected to 204.62.14.135
19:38:00 [Git][personal_repo] Checking for remote changes...
19:38:00 [Cmd] /usr/bin/git ls-remote origin master
19:38:00 [Cmd] /usr/bin/git status --porcelain
19:38:00 [Cmd] /usr/bin/git status --porcelain
...
...
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
19:38:01 [Cmd] /usr/bin/git status --porcelain
...
...
....KEEPS GOING ON AND ON UNTIL SS IS KILLED...

Let me know if you want to see the log where SS is initially indexing the newly added dir. w/ the .git repo within it.

UPDATE: I've written this up on my blog, the 2 log files can be read over there, I didn't want to pollute this thread with a big log file.

hbons commented 12 years ago

I'm figuring out what to do with this right now. To wrap up from the discussion above, there are two options:

Ignore all VCSs, as SparkleShare is a kind of VCS itself;
Treat any kind of file equally.

I'm not quite sure about this yet, but I'm hanging towards the first (SparkleShare is ignoring CVS and SVN specific files already), at least until we have a better backend.

@slmingol I've tested adding a git repo to SparkleShare, and although it creates a submodule and won't sync the actual files inside, it doesn't freeze or go nuts or anything. I saw from your blogpost that you're still on 0.2, so I suggest trying the latest version.

hbons commented 12 years ago

Right, so git sees something as a submodule if it has the .git/HEAD file, so I'm going to try and rename that whenever it gets added.

dabrahams commented 12 years ago

@hbons: it's not entirely clear what the difference between the two bullets is, but I'm guessing the first one means "if you see a .git folder, pretend it's not there." FWIW, If that's the way you end up going, I won't have much use for SS.

hbons commented 12 years ago

@dabrahams yep, that's what it will be for now.

duggi commented 12 years ago

i just added a bunch of stuff and SS kept crashing. the logs didn't mention anything.

i realized that a lot of folders in the stuff i was trying to sparkleshare were already git repos, and this was crashing SS.

removed them all and SS stopped crashing/quitting

slmingol commented 12 years ago

@duggi, what version of SS are you using?

duggi commented 12 years ago

i am using 0.8.2

-- client:
osx 10.6.8
git 1.7.4.4

-- server
Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-41)

definitely keep experiencing this, again as early as this morning

alexeymorozov commented 12 years ago

So SparkleShare isn't sync .git folder for now but just ignore it?

oderwat commented 11 years ago

So why is that feature request "closed" now?

SparkleShare can not sync anything which contains a ".git" folder and is also being influenced by a system wide git global config?

This feels to me like abusing "git" to pretend to have made something cool without paying back to the original developers of git!

What is the problem in changing the behaviour to use ".sgit" and a global ".sgit" config folder? This should be possible and would solve all those problems with ease.

hbons commented 11 years ago

@oderwat i'm not sure why you are occusing me of hijacking people's work. it's widely known that SparkleShare leverages Git to do its job. it's even advertised on the home page.

the current behaviour is to ignore the .git directory and only sync "checked out" files.

i don't understand the solution you're proposing. can you elaborate a bit more?

oderwat commented 11 years ago

Well it's just how I feel about ignoring (closing) this request. You use git but let git users "alone" which want to use SparkleShare with their projects. Which would be cool for various reasons.

What I meant is, that I wonder why you can not simply use something like "git --git-dir=.ssgit " to make you "own" git commands ignore a pre existing ".git". This way it should be possible to have SparkleShare syncing GitRepositories. I did not test it. May be that there are problems I don't know of.

In addition I think it may be possible for your software to use "git -c = " to specify some config options to ignore global options like "git -c core.excludesfile='~/.ssignores'" or whatever else may interference. This could be optional.

If above failes you could also use a special or patched GIT version ... dunno if it is cool to patch the binaries. But you could have a special version in the path for your software.

There may be other ways ... I just wondered why this ticket was being closed without a statement about "why" it is being closed. May be that you say: I don't care about this problem. Which is what I guesstimated and in return made me angry about you using "git" but not making it work with "git" :)

You say SS ignores ".git" and just cares about the checked out files. But when I clone a project there is a ".git" folder in the root of the project folder. So how can I use "git" there?

ChristianMertes commented 11 years ago

I don't understand exactly why this was closed but I can confirm that SparkleShare goes nuts when you copy some .git into a SS directory. One core will go down with filling the memory until none is left, then releasing the memory only to start reallocating again and with printing endless streams of "fatal: 'git status --porcelain' failed in submodule xyz" where xyz is the name of the "intruding" git repository. I vote for "reopen".

hbons commented 11 years ago

i'll look at this again, as it seems it still doesn't work for everyone.

hbons commented 11 years ago

@mudd1 which OS and version are you running?

ChristianMertes commented 11 years ago

@hbons I'm running 1.0.0 on Ubuntu Linux 12.04 (that's kernel 3.2.0-34 and Mono 2.10.8.1 if that's important).

Minthos commented 11 years ago

Confirming this is still broken.

I copied all my working copies into ss, and whenever ss syncs it renames all my .git/HEAD files to .git/HEAD.backup, breaking them for my other vcs tools. Renaming them back works until next time ss syncs.

Mac OS X 10.7 SparkleShare version 1.0

hbons commented 11 years ago

@Minthos this is the intended behaviour.

Minthos commented 11 years ago

So it's intended to be broken. I hope you'll excuse me for being disappointed.

hbons commented 11 years ago

@Minthos sure, no problem.

apassi commented 11 years ago

Yuuup... I have the same problem. One solution is to create encfs directory under SS which mess the filenames, and mount it away from SS. I think there is also implementations for windows and androids.

You can also use it to push to public repos.

gavin-s-smith commented 11 years ago

I'd just like to add my +1 to this being an issue for me, even if it is currently the intended behaviour.

ZeerDonker commented 11 years ago

gavin-s-smith (and possibly others) you might want to try: http://labs.bittorrent.com/experiments/sync.html (currently there is no versioning.) And use this tool for other non git folders.

lee-elenbaas commented 11 years ago

i am not sure i like this suggestion at all

i see two options, and i dislike both

let git (or other back-end) perform the sync to a hidden folder - and then have SS sync that local folder to another folder visible to the user. This means that the storage cost for SS will increase.
let git (or other back-end) perform sync directly to the work folder - and then change that folder from under their feet. This will mean losing the ability to work with the backend directly inside the folder (and for me this is one of the strongest points of SS - sometimes SS simply can't do what git can - and sometimes it should not)

option 2 can be enhances using smudge/clean like mechanisms - but i see no end to the troubles down that road.

hbons commented 11 years ago

@mudd1 i can't reproduce problems adding git repos. your issue seems more like #1170 though, so i'm closing this one.

alazare619 commented 9 years ago

Is this still a issue?

hbons / SparkleShare

Properly handle .git folders #335

define DEFAULT_GIT_DIR_ENVIRONMENT ".git"