ipld / js-ipld-git

MIT License
34 stars 11 forks source link

Question about IPLD Git URLs #23

Open billiegoose opened 6 years ago

billiegoose commented 6 years ago

Hi! Lead author of isomorphic-git here. In preparation for Chrome whitelisting of ipfs and git schemas in registerProtocolHandler (Firefox already has I believe) I'm investigating how git URLs handle paths to trees and blobs. AFAICT git uses URLs to identify repos but doesn't specify a standard URL format for identifying branches and blobs inside repos. npm uses a # sign after the repo url to identify a branch or SHA1. How are you handling this in IPFS? What's a fully qualified git IPLD reference to a blob look like? If git doesn't have a native way to do it, I'd rather be compatible with IPLD than make up a convention.

Secondly, the parsers and serializers look similar to the ones I wrote for isomorphic-git; we should compare notes sometime! Maybe we can consolidate the parsers and serializers into standalone modules that others can use?

magik6k commented 6 years ago

This repo implements git at the object level (everything that lives in .git/objects), so it doesn't take care of refs (branches, tags, etc.) or any other stuff like that.

https://github.com/ipfs-shipyard/git-remote-ipld takes care of branches by wrapping repositories in a unixfs tree with structure like .git/refs, but pointing directly at ipld objects (there is also an optional objects directory in which maps large git objects in the repo to unixfs file hashes - we need to do this because ipfs has object size limit of about 2mb).

If you want, there is a snapshot of go-ipfs pushed at QmU1HJJDFSM8JJq4r31wSLfj51oysQCswz7aL78UWZHuMC, you can play with it using ipfs dag get (from go-ipfs, not sure if js-ipfs already implemented that):

Some examples:

$ ipfs dag get QmU1HJJDFSM8JJq4r31wSLfj51oysQCswz7aL78UWZHuMC | jq .
{
  "data": "CAE=",
  "links": [
    {
      "Cid": {
        "/": "QmWwpyB5fXUBs8CucNK3THJs3NfNE1PfBwnT3m5jJuTRNb"
      },
      "Name": "HEAD",
      "Size": 25
    },
    {
      "Cid": {
        "/": "QmSfnuodJzaTrKN66ciaL8yYphhK53c6MLHvTjxrDbJDrG"
      },
      "Name": "objects",
      "Size": 37261724
    },
    {
      "Cid": {
        "/": "QmfBUWe2T6CVaWHM876FBfGg82kn15Uwu856zWA4Y3LWW5"
      },
      "Name": "refs",
      "Size": 14789
    }
  ]
}

$ ipfs dag get QmU1HJJDFSM8JJq4r31wSLfj51oysQCswz7aL78UWZHuMC/refs/heads/doc | jq .
{
  "data": "CAE=",
  "links": [
    {
      "Cid": {
        "/": "z8mWaFUntoi4vHmGDZ8qaMj3WRfSv4Zs5"
      },
      "Name": "experimental",
      "Size": 42
    }
  ]
}

$ ipfs dag get QmU1HJJDFSM8JJq4r31wSLfj51oysQCswz7aL78UWZHuMC/refs/heads/master | jq .
{
  "author": {
    "date": "1524801121 +0900",
    "email": "why@ipfs.io",
    "name": "Whyrusleeping"
  },
  "committer": {
    "date": "1524801121 +0900",
    "email": "noreply@github.com",
    "name": "GitHub"
  },
  "message": "Merge pull request #4983 from ipfs/gx/release-0.4.15-rc1\n\ngx: release 0.4.15-rc1",
  "other": [
    " "
  ],
  "parents": [
    {
      "/": "z8mWaFffUH9vGYgdaHt7MFockg3DhhGrR"
    },
    {
      "/": "z8mWaHtcPgdgRfXkNEeRKkg6NUYPn9tFq"
    }
  ],
  "signature": {
    "Text": " \n wsBcBAABCAAQBQJa4p5hCRBK7hj4Ov3rIwAAdHIIAJjt7J8xY+V1gKzSz+m5eBrk\n YpE++MDjCnpkD+zsEB8CZwGqopyweunyeHVPeqjQMu3Dbeo9HxZgkACizRBP9ibw\n hlgjNVu7i+mxDsH34Jk3nnO3g3bGbx+5he7iio3P/sRtl7cVVY/DIfSF/Z4S9JbE\n 1hCqFj8cm8BQn11H2AuAKDM43i4qJchYbbuLdGh7Jyn40CJnuepeFJ9XUJd/03f7\n 6DAyxAP2G+0ocA25D2/UWx8nFl52qc1JQnZRICeYaq6n1kDEoxZLPcC7+riGsJnv\n PmDyqg3n0ygC1Ac8K71NBw25DgoH78PvWORwnynZQjeCUPenhYIhWFP14s5bmt0=\n =OBeH\n"
  },
  "tree": {
    "/": "z8mWaFVyw6BavD34CqurjRyM7Y4mBs6cs"
  }
}
billiegoose commented 6 years ago

Ah okay. Let me see if I understand? IPLD isn't a URL format, it's a JSON format. That makes sense, because it's describing linked data. To put it in graph terminology, IPLD describes "nodes" and the "edges" are links. And similar to git itself, IPFS treats locating objects locally as a separate concern from locating repositories globally. The former is done with this library and the latter is handled by git-remote-ipld.

Kind of like how URLs are split into the "domain name" portion which identifies the server and the "path" portion that identifies a file on that server? The thing that annoys me is that branch names can contain '/' so combining a branch name to a file path with / would be ambiguous.

My goal is to come up with a way to specify an individual file for use in (say) an <img> tag. So you could create webpages where the resources are all fetched from a distributed git repository rather than using HTTP. But maybe that's not a logical goal? I think I'll go back to the drawing board and think some more.