Closed whyrusleeping closed 7 years ago
this PR changes the behavior.
previously, allowing "$foo" on the webui, would allow "$foo" and all its children. BUT the children would not be able to be viewed individually, only through the parent:
/$foo_hash/child ok
$child_hash forbidden
we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?
@mappum @diasdavid o/ thoughts re
we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?
the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'
the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'
no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.
the reason i bring it up, too, is that there are implications to how HTTP works. for example, i may want to allow, root X, which has Y at /X/Y
, and Y in the context of X is fine. But I may not want to allow /Z/Y
, or even Z to use Y as /Y
.
I'm leaning towards: "allowing/denying roots" as the right idea. you have to look through and make sure everything is ok to be used that way, but there's no questions otherwise.
no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.
are you talking about whitelists or blacklists? there are different implications depending on that context. In the case of a whitelist, what you say is true. In the case of a blocklist, what I'm saying is true.
i mean the webui "allowlist".
okay, the more i think of this, the more i think we're going to have to treat each list differently. the "allowlist" is going to have to use the root checking like you say, but the "do-not-allowlist" is going to need to use the technique i'm using.
451
. some may be just forbidden. etc. may want to brainstorm how to represent the reason.right now, the allowlist and denylist only stack once. other filter systems allow users to compose a stack of filters, allowing expressions like:
# deny everything except things under good/ but not good/nowait,bad
DENY *
ALLOW good/*
DENY good/nowait,bad/
ALLOW *
DENY bad/*
ALLOW bad/butonlyalittlebad/*
and so on.
(this makes sense for IP cidr filters too)
@jbenet question about caching the lists. how should they be stored? as one list for allow and one list for deny on disk? and how often/when should they be updated?
@whyrusleeping re "how they should be stored":
They should be ipfs objects. we can import them from JSON.
@jbenet we already talked about this... thats the reason we arent fetching them from ipfs in the first place. the gateway is set up during node construction, fetching objects before the node is completed is a hassle.
the gateway is set up during node construction, fetching objects before the node is completed is a hassle
We can fetch ipfs objects over HTTP just fine. (to clarify, i'm suggesting hosting the lists as ipfs objects (either in json or cbor (or both! "list.json"
and "list.cbor"
) and only pulling them with http. add them to local repo then)
Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?
Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?
What I'm getting at is, it'd make the denylist subject to the same absurd theater that it's supposed to make transparent
@jbenet but if we fetch ipfs object over http, what exactly are we caching?
@jbenet but if we fetch ipfs object over http, what exactly are we caching?
the object. cache it locally because you're not guaranteed to be able to reach the host every time you launch IPFS. Basically:
func start() {
tryDownloadingNewDenylist()
loadDenyList()
}
func tryDownloadingNewDenylist() {
// do http request.
// if successful, store as an ipfs object locally.
}
func loadDenylist() {
// load local denylist ipfs object
}
@lgierth i think we'll be fine with distributing a list of hashes. we generate the denylist, so it's technically our copyright?
@lgierth whats in the Uri
field of the blocklists?
@lgierth whats in the Uri field of the blocklists?
@whyrusleeping The response is an array of notice objects, each consisting of a URI for more information on that notice, and a list of keys to block:
[ { "uri":"http://dmca.ipfs.io/2015-08-03-foobar",
"keys":["Qmfoobar","Qmsomething","Qmbarbaz"] } ]
As per our chat yesterday, we can just flatten that, because the concept of a notice just isn't needed on the gateway's end.
["Qmfoobar","Qmsomething","Qmbarbaz"]
Would it be a good idea to respond with a URI template for the for-more-information-click-here link?
the for-more-information-click-here link?
This link would be something like http://denylist.ipfs.io/key/Qmfoobar
, and the denylist daemon would figure out which notice to show.
@lgierth that SGTM
two things from yesterday's discussion:
ipfs.io/refs/lists/denylists
# and put deny lists there like
ipfs.io/refs/lists/denylists/dmca
# that way we can have other lists like
ipfs.io/refs/lists/archives/archive.org/HEAD
ipfs.io/refs/lists/archives/arxiv.org/HEAD
I'm a bit confused. Are you saying we should just leave the denylist daemon be and serve everything from IPFS? I can see that work out, but the denylist is a bit more than just a list of keys. It's a list of lists of keys, each list with a reason and notice text. Or do you mean it should only be presented as a flat list of keys to the gateway? That would be different from trying to fit both denylist and archives into the same list abstraction for storage.
i think we'll be fine with distributing a list of hashes. we generate the denylist, so it's technically our copyright?
Just saying there's all kinds of crazy notices, we should anticipate that some will be nonsense ("take down localhost!!1"), and make sure those don't do any harm.
I'm a bit confused. Are you saying we should just leave the denylist daemon be and serve everything from IPFS? I can see that work out, but the denylist is a bit more than just a list of keys. It's a list of lists of keys, each list with a reason and notice text. Or do you mean it should only be presented as a flat list of keys to the gateway? That would be different from trying to fit both denylist and archives into the same list abstraction for storage.
Sorry for being unclear. I mean that we should host the lists at those HTTP endpoints:
https://ipfs.io/refs/lists/denylists
# and put deny lists there like
https://ipfs.io/refs/lists/denylists/dmca.cbor # or ?codec=json instead of extension
https://ipfs.io/refs/lists/denylists/dmca.json
# that way we can have other lists like
https://ipfs.io/refs/lists/archives/archive.org/HEAD.json
https://ipfs.io/refs/lists/archives/archive.org/HEAD.cbor
https://ipfs.io/refs/lists/archives/arxiv.org/HEAD.json
https://ipfs.io/refs/lists/archives/arxiv.org/HEAD.cbor
And these should return IPFS objects-- meaning, the data is a valid IPFS object, in the given encoding. so we can basically do (though inside in the daemon, with Go):
newDenyList=$(curl https://ipfs.io/refs/lists/denylists/dmca.json | ipfs object put --enc=json)
them being IPFS objects is useful because then we can also distribute them via ipfs (not just HTTP), and once we make object signing, we can trivially add signatures to the whole thing and validate them using the tools we'll already make.
(and that way, we can serve the lists even over http://
and people wont be able to fake-add DMCA by screwing with people's traffic/dns
Just saying there's all kinds of crazy notices, we should anticipate that some will be nonsense ("take down localhost!!1"), and make sure those don't do any harm.
indeed! (and hahahhaha wow.)
@jbenet so the flow i think i'm getting is:
checkLocalDagstoreFor(listKey)
if thatFailed {
queryHTTP(listUrl)
if thatFailedToo {
die
}
writeListToLocalBlockstore(list)
}
question: where does the listKey
come from? do we store it in the config? that would likely required a map or something in the config to store list urls and cached keys.
also, when should we check for updates to the lists?
checkLocalDagstoreFor(listKey)
if thatFailed {
queryHTTP(listUrl)
if thatFailedToo {
die
}
writeListToLocalBlockstore(list)
}
this o/ wont work because after getting a list you would never update again. i think we need:
if updateCondition() {
if list, err := getListViaHTTP(url); err == nil {
dag.Add(list)
}
}
list, err := dag.Get(listKey)
if err != nil {
list = emptyList
}
@jbenet
also, when should we check for updates to the lists?
@whyrusleeping
also, when should we check for updates to the lists?
I think we shouldn't go over an hour with a stale one. another question is that long-running daemons should poll over time to remain up to date. this is way easier if we make the lists just return ipfs refs. see below.
(could use the main api endpoint to return a head (i.e. a hash of the head) and then grab the head from the global http gateway itself. so
https://ipfs.io/refs/lists/denylists/dmca
https://ipfs.io/refs/lists/archives/archive.org
https://ipfs.io/refs/lists/archives/arxiv.org
could be just hashes, which we can then get with:
https://ipfs.io/ipfs/api/v0/object/get?arg=/ipfs/$head
that way, the json / cbor / whatever encoding question is handled by the API. (yay!)
we want to make these changes to propagate fast through the network
Should we go with something like this?
> GET https://gateway.ipfs.io/refs/lists/denylists/dmca HTTP/1.1
< HTTP/1.1 302 Found
< Location: /ipfs/$hash
<
< $hash
$hash
resolves to a DenyList
.
type DenyList struct {
Objects []Link // all DenyObjects
}
type DenyObject struct {
Object Link // the object that's supposed to be blocked
Notice Link // the dmca notice object
}
type DenyNotice struct {
Value Data // opaque value
}
Get all DenyList.Object
s, then populate denylist with all DenyObject.Object
link hashes.
> GET https://gateway.ipfs.io/ipfs/$deniedHash
< HTTP/1.1 451 Unavailable For Legal Reasons
<
< <a href="/dmca/$deniedHash">Show DenyNotice object,
< and the DenyObjects linking to it</a>
@lgierth that LGreatTM.
One minor thing, for the href
in the returned page, what about maybe generating the page and link its ipfs object instead (so href="/ipfs/$pageShowingDenyNoticeAndLinks/"
instead of href="/dmca/$deniedHash"
). That way we would not have to implement special route handling for /dmca/<hash>
(another route to worry about).
@jbenet the gateway would have to keep the reverse-links from Object to DenyObject in order to get the notice. The idea above was that the little gateway-dmca-denylist daemon would take care of that, so that we don't have to introduce any logic to the gateway apart from basic denying.
Or is there an API function for getting objects linking to a given hash?
@lgierth we could probably prebuild all the pages statically every time we add to the list. and maybe we could link to a server we run then. dont want to put tons of dmca
specific stuff in ipfs daemon.
gateway-dmca-denylist now builds this kind of list: https://ipfs.io/ipfs/QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
... which is a unixfs structure, which I figured is the simplest to implement and look at (thanks to the dir-listing).
It fans out based on the keys associated with each notice, so that e.g. a notice with 3 keys will result in 3 objects linked in the denylist. Each of these objects links to the rendered notice, and the object to be blocked as identified by the key. The rendered notice is the same for all these.
$ go run denylist.go
QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
$
$ # this is the denylist object
$ ipfs object get QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
{
"Links": [
{
"Name": "2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
"Hash": "QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND",
"Size": 621
},
{
"Name": "2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR",
"Hash": "QmQawxoaW5j3z6VUKAKd8zVqYKgiJ3TouisxB5jNGFxQqb",
"Size": 640
}
],
"Data": "\u0008\u0001"
}
$
$ # the first item of the denylist, linking to the rendered notice and the object to be blocked
$ ipfs object get QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND
{
"Links": [
{
"Name": "notice",
"Hash": "QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3",
"Size": 476
},
{
"Name": "object",
"Hash": "QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
"Size": 44
}
],
"Data": "\u0008\u0001"
}
$
$ # this is the rendered notice
$ ipfs cat QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>Unavailable for Legal Reasons</title>
</head>
<body>
remove my stuff plox
<h2>Affected Objects</h2>
<ul>
<li><a href="/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz">/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz</a></li>
<li><a href="/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR">/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR</a></li>
</ul>
</body>
</html>
@lgierth feedback. Looking good!
what if instead of:
denylist/dmca/
denylist/dmca/2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>-<hash>
instead do:
denylist/dmca/
denylist/dmca/2015-08-03-foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>/<hash>
this could:
Or is there a reason to have the hashes accessible from the root? is it about finding them easily? (could still do find over levels, just a bit harder. but if find is a problem, then we probably have to land unixfs dir sharding)
there's also something to be said for:
denylist/dmca/
denylist/dmca/2015/08/03/foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015/08/03/foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<year>/<month>/<day>/<org>/<hash>
but this index can be built later anyway
<root>/previous
or something that's just the hash of the previous thing. (version history pre-commit chains)Or is there a reason to have the hashes accessible from the root
Regarding the hashes being present at all, I'm only appending them to the link name so that I don't end up with name clashes. These suffixes could as well be a counter (yyyy-mm-dd-org-$i
). That's probably better, and easy to do.
Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it <root>/yyyy/mm/dd/org/i
if that doesn't make it a pain for clients.
may want to include a link to the previous version of the list
Sure
Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it
/yyyy/mm/dd/org/i if that doesn't make it a pain for clients.
@whyrusleeping wdyt?
@lgierth i can probably handle it.
closing, rebasing and making this work with todays codebase would require more work than rewriting it from scratch
License: MIT Signed-off-by: Jeromy jeromyj@gmail.com