implement access control lists for gateway

whyrusleeping commented 8 years ago

License: MIT Signed-off-by: Jeromy jeromyj@gmail.com

jbenet commented 8 years ago

this PR changes the behavior.

previously, allowing "$foo" on the webui, would allow "$foo" and all its children. BUT the children would not be able to be viewed individually, only through the parent:

/$foo_hash/child  ok
$child_hash  forbidden

we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?

jbenet commented 8 years ago

@mappum @diasdavid o/ thoughts re

we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?

whyrusleeping commented 8 years ago

the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'

jbenet commented 8 years ago

the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'

no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.

the reason i bring it up, too, is that there are implications to how HTTP works. for example, i may want to allow, root X, which has Y at /X/Y, and Y in the context of X is fine. But I may not want to allow /Z/Y, or even Z to use Y as /Y.

I'm leaning towards: "allowing/denying roots" as the right idea. you have to look through and make sure everything is ok to be used that way, but there's no questions otherwise.

whyrusleeping commented 8 years ago

no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.

are you talking about whitelists or blacklists? there are different implications depending on that context. In the case of a whitelist, what you say is true. In the case of a blocklist, what I'm saying is true.

jbenet commented 8 years ago

i mean the webui "allowlist".

whyrusleeping commented 8 years ago

okay, the more i think of this, the more i think we're going to have to treat each list differently. the "allowlist" is going to have to use the root checking like you say, but the "do-not-allowlist" is going to need to use the technique i'm using.

jbenet commented 8 years ago

[ ] cache lists locally, only update with http.
[ ] maybe the lists should include a reason for blocking/allowing, that way we can select the appropriate error code. not everything will be a 451. some may be just forbidden. etc. may want to brainstorm how to represent the reason.
[ ] allowlist should use the roots technique
[ ] denylist should continue using what you're doing

jbenet commented 8 years ago

not suggesting to change it to this, this is only for discussion:

right now, the allowlist and denylist only stack once. other filter systems allow users to compose a stack of filters, allowing expressions like:

# deny everything except things under good/ but not good/nowait,bad
DENY *
ALLOW good/*
DENY good/nowait,bad/

ALLOW *
DENY bad/*
ALLOW bad/butonlyalittlebad/*

and so on.

(this makes sense for IP cidr filters too)

whyrusleeping commented 8 years ago

@jbenet question about caching the lists. how should they be stored? as one list for allow and one list for deny on disk? and how often/when should they be updated?

jbenet commented 8 years ago

@whyrusleeping re "how they should be stored":

They should be ipfs objects. we can import them from JSON.

whyrusleeping commented 8 years ago

@jbenet we already talked about this... thats the reason we arent fetching them from ipfs in the first place. the gateway is set up during node construction, fetching objects before the node is completed is a hassle.

jbenet commented 8 years ago

the gateway is set up during node construction, fetching objects before the node is completed is a hassle

We can fetch ipfs objects over HTTP just fine. (to clarify, i'm suggesting hosting the lists as ipfs objects (either in json or cbor (or both! "list.json" and "list.cbor") and only pulling them with http. add them to local repo then)

ghost commented 8 years ago

Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?

ghost commented 8 years ago

Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?

What I'm getting at is, it'd make the denylist subject to the same absurd theater that it's supposed to make transparent

whyrusleeping commented 8 years ago

@jbenet but if we fetch ipfs object over http, what exactly are we caching?

jbenet commented 8 years ago

@jbenet but if we fetch ipfs object over http, what exactly are we caching?

the object. cache it locally because you're not guaranteed to be able to reach the host every time you launch IPFS. Basically:

func start() {
  tryDownloadingNewDenylist()
  loadDenyList()
}

func tryDownloadingNewDenylist() {
  // do http request.
  // if successful, store as an ipfs object locally.
}

func loadDenylist() {
  // load local denylist ipfs object
}

jbenet commented 8 years ago

@lgierth i think we'll be fine with distributing a list of hashes. we generate the denylist, so it's technically our copyright?

whyrusleeping commented 8 years ago

@lgierth whats in the Uri field of the blocklists?

ghost commented 8 years ago

@lgierth whats in the Uri field of the blocklists?

@whyrusleeping The response is an array of notice objects, each consisting of a URI for more information on that notice, and a list of keys to block:

[ { "uri":"http://dmca.ipfs.io/2015-08-03-foobar",
    "keys":["Qmfoobar","Qmsomething","Qmbarbaz"] } ]

As per our chat yesterday, we can just flatten that, because the concept of a notice just isn't needed on the gateway's end.

["Qmfoobar","Qmsomething","Qmbarbaz"]

Would it be a good idea to respond with a URI template for the for-more-information-click-here link?

ghost commented 8 years ago

the for-more-information-click-here link?

This link would be something like http://denylist.ipfs.io/key/Qmfoobar, and the denylist daemon would figure out which notice to show.

whyrusleeping commented 8 years ago

@lgierth that SGTM

jbenet commented 8 years ago

two things from yesterday's discussion:

we need to be able to specify a reason on the list that lets the gateway decide between 451 and something else.
please have URLs be something like:

ipfs.io/refs/lists/denylists 

# and put deny lists there like 
ipfs.io/refs/lists/denylists/dmca

# that way we can have other lists like 

ipfs.io/refs/lists/archives/archive.org/HEAD
ipfs.io/refs/lists/archives/arxiv.org/HEAD

ghost commented 8 years ago

I'm a bit confused. Are you saying we should just leave the denylist daemon be and serve everything from IPFS? I can see that work out, but the denylist is a bit more than just a list of keys. It's a list of lists of keys, each list with a reason and notice text. Or do you mean it should only be presented as a flat list of keys to the gateway? That would be different from trying to fit both denylist and archives into the same list abstraction for storage.

i think we'll be fine with distributing a list of hashes. we generate the denylist, so it's technically our copyright?

Just saying there's all kinds of crazy notices, we should anticipate that some will be nonsense ("take down localhost!!1"), and make sure those don't do any harm.

jbenet commented 8 years ago

I'm a bit confused. Are you saying we should just leave the denylist daemon be and serve everything from IPFS? I can see that work out, but the denylist is a bit more than just a list of keys. It's a list of lists of keys, each list with a reason and notice text. Or do you mean it should only be presented as a flat list of keys to the gateway? That would be different from trying to fit both denylist and archives into the same list abstraction for storage.

Sorry for being unclear. I mean that we should host the lists at those HTTP endpoints:

https://ipfs.io/refs/lists/denylists 

# and put deny lists there like 
https://ipfs.io/refs/lists/denylists/dmca.cbor  # or ?codec=json instead of extension
https://ipfs.io/refs/lists/denylists/dmca.json

# that way we can have other lists like 

https://ipfs.io/refs/lists/archives/archive.org/HEAD.json
https://ipfs.io/refs/lists/archives/archive.org/HEAD.cbor
https://ipfs.io/refs/lists/archives/arxiv.org/HEAD.json
https://ipfs.io/refs/lists/archives/arxiv.org/HEAD.cbor

And these should return IPFS objects-- meaning, the data is a valid IPFS object, in the given encoding. so we can basically do (though inside in the daemon, with Go):

newDenyList=$(curl https://ipfs.io/refs/lists/denylists/dmca.json | ipfs object put --enc=json)

them being IPFS objects is useful because then we can also distribute them via ipfs (not just HTTP), and once we make object signing, we can trivially add signatures to the whole thing and validate them using the tools we'll already make.

(and that way, we can serve the lists even over http:// and people wont be able to fake-add DMCA by screwing with people's traffic/dns

Just saying there's all kinds of crazy notices, we should anticipate that some will be nonsense ("take down localhost!!1"), and make sure those don't do any harm.

indeed! (and hahahhaha wow.)

whyrusleeping commented 8 years ago

@jbenet so the flow i think i'm getting is:

checkLocalDagstoreFor(listKey)
if thatFailed {
  queryHTTP(listUrl)
  if thatFailedToo {
    die
  }
  writeListToLocalBlockstore(list)
}

question: where does the listKey come from? do we store it in the config? that would likely required a map or something in the config to store list urls and cached keys.

whyrusleeping commented 8 years ago

also, when should we check for updates to the lists?

jbenet commented 8 years ago

checkLocalDagstoreFor(listKey)
if thatFailed {
  queryHTTP(listUrl)
  if thatFailedToo {
    die
  }
  writeListToLocalBlockstore(list)
}

this o/ wont work because after getting a list you would never update again. i think we need:

if updateCondition() {
  if list, err := getListViaHTTP(url); err == nil {
    dag.Add(list)
  }
}
list, err := dag.Get(listKey)
if err != nil {
  list = emptyList
}

whyrusleeping commented 8 years ago

@jbenet

also, when should we check for updates to the lists?

jbenet commented 8 years ago

@whyrusleeping

also, when should we check for updates to the lists?

I think we shouldn't go over an hour with a stale one. another question is that long-running daemons should poll over time to remain up to date. this is way easier if we make the lists just return ipfs refs. see below.

jbenet commented 8 years ago

(could use the main api endpoint to return a head (i.e. a hash of the head) and then grab the head from the global http gateway itself. so

https://ipfs.io/refs/lists/denylists/dmca
https://ipfs.io/refs/lists/archives/archive.org
https://ipfs.io/refs/lists/archives/arxiv.org

could be just hashes, which we can then get with:

https://ipfs.io/ipfs/api/v0/object/get?arg=/ipfs/$head

that way, the json / cbor / whatever encoding question is handled by the API. (yay!)

we want to make these changes to propagate fast through the network

ghost commented 8 years ago

Should we go with something like this?

> GET https://gateway.ipfs.io/refs/lists/denylists/dmca HTTP/1.1

< HTTP/1.1 302 Found
< Location: /ipfs/$hash
<
< $hash

$hash resolves to a DenyList.

type DenyList struct {
  Objects []Link // all DenyObjects
}

type DenyObject struct {
  Object Link // the object that's supposed to be blocked
  Notice Link // the dmca notice object
}

type DenyNotice struct {
  Value Data // opaque value
}

Get all DenyList.Objects, then populate denylist with all DenyObject.Object link hashes.

> GET https://gateway.ipfs.io/ipfs/$deniedHash

< HTTP/1.1 451 Unavailable For Legal Reasons
<
< <a href="/dmca/$deniedHash">Show DenyNotice object,
<                             and the DenyObjects linking to it</a>

jbenet commented 8 years ago

@lgierth that LGreatTM.

One minor thing, for the href in the returned page, what about maybe generating the page and link its ipfs object instead (so href="/ipfs/$pageShowingDenyNoticeAndLinks/" instead of href="/dmca/$deniedHash"). That way we would not have to implement special route handling for /dmca/<hash> (another route to worry about).

ghost commented 8 years ago

@jbenet the gateway would have to keep the reverse-links from Object to DenyObject in order to get the notice. The idea above was that the little gateway-dmca-denylist daemon would take care of that, so that we don't have to introduce any logic to the gateway apart from basic denying.

Or is there an API function for getting objects linking to a given hash?

jbenet commented 8 years ago

@lgierth we could probably prebuild all the pages statically every time we add to the list. and maybe we could link to a server we run then. dont want to put tons of dmca specific stuff in ipfs daemon.

ghost commented 8 years ago

gateway-dmca-denylist now builds this kind of list: https://ipfs.io/ipfs/QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5

... which is a unixfs structure, which I figured is the simplest to implement and look at (thanks to the dir-listing).

It fans out based on the keys associated with each notice, so that e.g. a notice with 3 keys will result in 3 objects linked in the denylist. Each of these objects links to the rendered notice, and the object to be blocked as identified by the key. The rendered notice is the same for all these.

$ go run denylist.go 
QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
$
$ # this is the denylist object
$ ipfs object get QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
{
  "Links": [
    {
      "Name": "2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
      "Hash": "QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND",
      "Size": 621
    },
    {
      "Name": "2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR",
      "Hash": "QmQawxoaW5j3z6VUKAKd8zVqYKgiJ3TouisxB5jNGFxQqb",
      "Size": 640
    }
  ],
  "Data": "\u0008\u0001"
}
$
$ # the first item of the denylist, linking to the rendered notice and the object to be blocked
$ ipfs object get QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND
{
  "Links": [
    {
      "Name": "notice",
      "Hash": "QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3",
      "Size": 476
    },
    {
      "Name": "object",
      "Hash": "QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
      "Size": 44
    }
  ],
  "Data": "\u0008\u0001"
}
$
$ # this is the rendered notice
$ ipfs cat QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>Unavailable for Legal Reasons</title>
</head>
<body>
    remove my stuff plox

    <h2>Affected Objects</h2>
    <ul>
        <li><a href="/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz">/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz</a></li>
        <li><a href="/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR">/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR</a></li>

    </ul>
</body>
</html>

jbenet commented 8 years ago

@lgierth feedback. Looking good!

what if instead of:

denylist/dmca/
denylist/dmca/2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>-<hash>

instead do:

denylist/dmca/
denylist/dmca/2015-08-03-foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>/<hash>

this could:

stave off the cost of not yet sharding unixfs dirs.
make it easier to deal with long lists in one notice
easier to navigate.

Or is there a reason to have the hashes accessible from the root? is it about finding them easily? (could still do find over levels, just a bit harder. but if find is a problem, then we probably have to land unixfs dir sharding)

there's also something to be said for:

denylist/dmca/
denylist/dmca/2015/08/03/foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015/08/03/foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<year>/<month>/<day>/<org>/<hash>

but this index can be built later anyway

may want to include a link to the previous version of the list. like a dir called <root>/previous or something that's just the hash of the previous thing. (version history pre-commit chains)

ghost commented 8 years ago

Or is there a reason to have the hashes accessible from the root

Regarding the hashes being present at all, I'm only appending them to the link name so that I don't end up with name clashes. These suffixes could as well be a counter (yyyy-mm-dd-org-$i). That's probably better, and easy to do.

Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it <root>/yyyy/mm/dd/org/i if that doesn't make it a pain for clients.

may want to include a link to the previous version of the list

Sure

ghost commented 8 years ago

Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it /yyyy/mm/dd/org/i if that doesn't make it a pain for clients.

@whyrusleeping wdyt?

whyrusleeping commented 8 years ago

@lgierth i can probably handle it.

whyrusleeping commented 7 years ago

closing, rebasing and making this work with todays codebase would require more work than rewriting it from scratch

ipfs / kubo

implement access control lists for gateway #1551

not suggesting to change it to this, this is only for discussion: