ipfs / go-ipld-git

ipld handlers for git objects
MIT License
57 stars 19 forks source link

Good first issues / Help wanted & libgit2 #26

Closed sameer closed 5 years ago

sameer commented 5 years ago

Hello, I came across go-ipld-git while working on a university project for putting git repos on IPFS. For now, I simply add the repo as a folder via ipfs. I would like to use go-ipld-git instead and was wondering where I could get started with helping.

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information? Some of the issues like #16 could be solved by parsing the date in the commit. Are you avoiding using the cgo compiler?

Stebalien commented 5 years ago

Are you avoiding using the cgo compiler?

This, mostly. We do a lot of cross compiling.

sameer commented 5 years ago

Are you avoiding using the cgo compiler?

This, mostly. We do a lot of cross compiling.

Ok, that makes sense. Are there any issues that I could help out with? I've worked with go but not libp2p or ipld in particular before.

Stebalien commented 5 years ago

Caching the CID (#6/#21) are probably good. Or just other code cleanups.

@magik6k this is really your domain. Need help with something git related?

magik6k commented 5 years ago

putting git repos on IPFS

There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).

There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?

go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.

A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.

Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).

As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - https://github.com/ipfs-shipyard/git-remote-ipld/issues/12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)

sameer commented 5 years ago

putting git repos on IPFS

There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).

I remember seeing this one -- so it is like adding a new type of remote to git, right?

There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently

This looks pretty useful, thanks for sharing!

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?

go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.

A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.

Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - ipfs-shipyard/git-remote-ipld#12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)

I can look into the ones Stebalien mentioned first to get started. Thanks for the guidance.

magik6k commented 5 years ago

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)

sameer commented 5 years ago

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)

So not being able to store the pack files themselves leads to the overhead?

magik6k commented 5 years ago

So not being able to store the pack files themselves leads to the overhead?

Yep, because pack-files can store diffs between objects. There is a quite good doc on how they do that in https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt