Closed sameer closed 5 years ago
Are you avoiding using the cgo compiler?
This, mostly. We do a lot of cross compiling.
Are you avoiding using the cgo compiler?
This, mostly. We do a lot of cross compiling.
Ok, that makes sense. Are there any issues that I could help out with? I've worked with go but not libp2p or ipld in particular before.
Caching the CID (#6/#21) are probably good. Or just other code cleanups.
@magik6k this is really your domain. Need help with something git related?
putting git repos on IPFS
There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).
There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently
Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?
go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.
A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.
Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).
As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - https://github.com/ipfs-shipyard/git-remote-ipld/issues/12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)
putting git repos on IPFS
There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).
I remember seeing this one -- so it is like adding a new type of remote to git, right?
There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently
This looks pretty useful, thanks for sharing!
Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?
go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.
A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.
Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).
By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?
As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - ipfs-shipyard/git-remote-ipld#12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)
I can look into the ones Stebalien mentioned first to get started. Thanks for the guidance.
By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?
Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)
By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?
Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)
So not being able to store the pack files themselves leads to the overhead?
So not being able to store the pack files themselves leads to the overhead?
Yep, because pack-files can store diffs between objects. There is a quite good doc on how they do that in https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt
Hello, I came across go-ipld-git while working on a university project for putting git repos on IPFS. For now, I simply add the repo as a folder via ipfs. I would like to use go-ipld-git instead and was wondering where I could get started with helping.
Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information? Some of the issues like #16 could be solved by parsing the date in the commit. Are you avoiding using the cgo compiler?