ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.82k stars 2.96k forks source link

Pinning new cbor object doesn't appear to work #3570

Closed ianopolous closed 7 years ago

ianopolous commented 7 years ago

Version information:

go-ipfs version: 0.4.5-dev-4cb236c Repo version: 4 System version: amd64/linux Golang version: go1.7.1

Type:

Bug

Priority:

P0

Description:

Pinning a new cbor object created using block.put doesn't appear to work. To reproduce:

>> echo -e "\x4b\x67\x27\x64\x61\x79\x20\x49\x50\x46\x53\x21" | ipfs block put --format=cbor
zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe
>> echo -e "\xd9\x01\x02\x58\x25\xa5\x03\x22\x12\x20\x65\x96\x50\xfc\x34\x43\xc9\x16\x42\x80\x48\xef\xc5\xba\x45\x58\xdc\x86\x35\x94\x98\x0a\x59\xf5\xcb\x3c\x4d\x84\x86\x7e\x6d\x31" | ipfs block put --format=cbor
zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs pin add -r zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs repo gc
>> ipfs block get zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs block get zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe

The gc should NOT remove the two blocks added (it currently removes both). And the subsequent gets should succeed. The first block is just a cbor byte array of 'gday IPFS!' The second is just a cbor merkle link to /ipfs/zdpuAue4N...

N.B. I may not have the correct serialization for the merkle link, but as far as I can tell it is correct (a cbor tag of 258 for the multiaddr)

ghost commented 7 years ago

Possibly related to #3453

ghost commented 7 years ago

Also #3553

ianopolous commented 7 years ago

@lgierth N.B. there are no errors returned here except for the final gets failing to find anything.

kevina commented 7 years ago

This could be a simple case of special cases for Protobuf nodes, in which case the fix should be simple. I'll look into it.

kevina commented 7 years ago

The pinning is failing for me. That is likely the problem. I am surprised it works for you:

panic: reflect.Set: value of type *cbor.CBORTag is not assignable to type map[interface {}]interface {}

However I am doing this with the daemon offline.

kevina commented 7 years ago

Okay, the cbor package is panicking when trying to decode the block in order to pin it. Here is the full backtrace:

panic: reflect.Set: value of type *cbor.CBORTag is not assignable to type map[interface {}]interface {}

goroutine 1 [running]:
panic(0xbc6500, 0xc4202e7860)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
reflect.Value.assignTo(0xbf88e0, 0xc420239fa0, 0x16, 0xcfef62, 0xb, 0xc04e60, 0x0, 0xc04e60, 0xbf88e0, 0xc420239fa0)
        /usr/local/go/src/reflect/value.go:2163 +0x35c
reflect.Value.Set(0xc04e60, 0xc42077a1d8, 0x195, 0xbf88e0, 0xc420239fa0, 0x16)
        /usr/local/go/src/reflect/value.go:1333 +0xa4
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*reflectValue).SetTag(0xc420239f80, 0x102, 0x1152d40, 0xc420239fc0, 0x0, 0x0, 0xbf88e0, 0xc420239fa0, 0x0, 0x0)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:1084 +0x100
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).innerDecodeC(0xc4201396b8, 0x1152d40, 0xc420239f80, 0xd9, 0x1, 0x1)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:408 +0xf20
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).DecodeAny(0xc4201396b8, 0x1152d40, 0xc420239f80, 0xc42077a1d8, 0x16)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:235 +0xc2
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).Decode(0xc4201396b8, 0xba7120, 0xc42077a1d8, 0x0, 0xc42004b140)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:125 +0xb4
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.Loads(0xc4200a26c0, 0x2b, 0x22b, 0xba7120, 0xc42077a1d8, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:80 +0x1ef
gx/ipfs/QmbuuwTd9x4NReZ7sxtiKk7wFcfDUo54MfWBdtF5MRCPGR/go-ipld-cbor.Decode(0xc4200a26c0, 0x2b, 0x22b, 0x22b, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/gx/ipfs/QmbuuwTd9x4NReZ7sxtiKk7wFcfDUo54MfWBdtF5MRCPGR/go-ipld-cbor/node.go:19 +0x75
github.com/ipfs/go-ipfs/merkledag.decodeBlock(0x114b760, 0xc420239f40, 0xc42007d980, 0xc42004b020, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/merkledag/merkledag.go:111 +0xb1
github.com/ipfs/go-ipfs/merkledag.(*dagService).Get(0xc420175000, 0x114b520, 0xc42007d980, 0xc42004b020, 0x0, 0x0, 0x0, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/merkledag/merkledag.go:89 +0x297
github.com/ipfs/go-ipfs/path.(*Resolver).ResolvePathComponents(0xc4202eece0, 0x114b520, 0xc4201402c0, 0xc42007cd00, 0x37, 0x37, 0xc420139a00, 0x709461, 0x0, 0xcf7106)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/path/resolver.go:106 +0x17f
github.com/ipfs/go-ipfs/path.(*Resolver).ResolvePath(0xc4202eece0, 0x114b520, 0xc4201402c0, 0xc42007cd00, 0x37, 0xc42030c4e0, 0xc42007cd00, 0x37, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/path/resolver.go:84 +0x7b
github.com/ipfs/go-ipfs/core.Resolve(0x114b520, 0xc4201402c0, 0x0, 0x0, 0xc4202eece0, 0xc42007cd00, 0x37, 0x41f0d8, 0x30, 0xcb99e0, ...)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/pathresolver.go:57 +0x360
github.com/ipfs/go-ipfs/core/corerepo.Pin(0xc420354180, 0x114b520, 0xc4201402c0, 0xc4202ee3a0, 0x1, 0x2, 0x1, 0x0, 0x0, 0x0, ...)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/corerepo/pinning.go:35 +0x139
github.com/ipfs/go-ipfs/core/commands.glob..func72(0x1153780, 0xc420316300, 0x1152b60, 0xc420322000)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/commands/pin.go:65 +0x208
github.com/ipfs/go-ipfs/commands.(*Command).Call(0x1230780, 0x1153780, 0xc420316300, 0x0, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/commands/command.go:116 +0x286
main.callCommand(0x114b520, 0xc4202f0840, 0x1153780, 0xc420316300, 0x1230780, 0x120f0e0, 0x0, 0x0, 0xc420139e60, 0x407ad3)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:349 +0x49a
main.(*cmdInvocation).Run(0xc4202f0780, 0x114b520, 0xc4202f0840, 0x1141ea0, 0xc4202ee400, 0x114b520, 0xc4202f0840)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:191 +0x116
main.main()
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:156 +0x366

@ianopolous could you try this with the daemon offline and make sure the block is valid cbor object.

ianopolous commented 7 years ago

Offline I get the same error. Also if I try and pin the first object I get:

Error: pin: cannot assign []byte into Kind=map Type= map[interface {}]interface {}(nil)

The first one is just a cbor byte[]. If this isn't allowed then that needs to be made clear. My reading of the docs was that any cbor is valid (and it would be a shame if not, as that will bloat the serialization with unnecessary stuff)

The second object has the following breakdown: 0xD90102 = Tag (258) Then a cbor byte[] of 37 bytes corresponding to the multiaddr of the hash of the first blob (/ipfs/QmVBCpx91Yb5hGCqkQbWeqQ83B8r9mbN9EMr3c6y22ePKE) I think it should be a cid not a multihash, but that is what the first command is returning over the http api (which is where I generate the cbor from) This means that, again, it isn't a map, but a tagged byte[].

kevina commented 7 years ago

@ianopolous thanks,I think this is a bug in the cbor library. I am afraid I am not that familiar with cbor objects so not sure how qualified I am to fix this. I will give it another shot before giving up tomorrow (Tuesday).

ianopolous commented 7 years ago

@kevina I would guess the cbor library is fine, but that ipfs is assuming the root cbor object is a map, not a general cbor object.

whyrusleeping commented 7 years ago

Yeah, I wasnt really thinking about having base 'non-map' cbor objects being handled when writing the cbor ipld stuff. We can either disallow that for now, or i can figure out how to rewrite the handling in the package to deal with those types of cbor things.

It really comes down to whats in the ipld spec, is "Foobar" a valid ipld object?

whyrusleeping commented 7 years ago

@ianopolous I'm thinking that the objects you created are not valid ipld-cbor objects. We're thinking that only 'map type' top level objects should be allowed. (which means we need to guard against this)

ianopolous commented 7 years ago

Can I ask why? As it adds a lot of unnecessary bytes to serialization, which matters in things with lots of small objects, like merkle-btrees. It is also conceptually simpler to allow any root object. I would assume in this case that you couldn't follow any ipld path through it, as it is effectively a leaf node as far as ipld is concerned? If you do decide to continue with that path I assume you will also restrict the keys in cbor maps to Strings, and if so, you should also make that clear.

ianopolous commented 7 years ago

I think the same case will also be encountered if you have an ipld selector path internal to an object, which ends up at a non map cbor object.

whyrusleeping commented 7 years ago

@diasdavid @nicola @jbenet @lgierth What do you guys think here?

whyrusleeping commented 7 years ago

@ianopolous could you give a clear description of your usecase here? It seems like youre trying to put your own format over the top of the dag-cbor format

ianopolous commented 7 years ago

Our usecase is a merkle-btree with lots of small nodes forming non leaf nodes, and leaf nodes which are actual file fragments (encrypted) and thus close to the IPFS object size limit (they are currently just a cbor byte[]).

The non leaf nodes are a list of up to 16 (label, value, target) tuples where label is the key in the btree, and value and target are (optional) multihashes which end up as merkle links in the cbor. $value points to an encrypted metadata blob which in turn points to encrypted file fragments. $target points to another merkle-btree node.

We handle the navigation of the btree client side, but need to ability to just pin the root and have the whole tree pinned. The non leaf nodes have cbor merkle links to other nodes.

whyrusleeping commented 7 years ago

if the links are encrypted, theres no way that pin can handle traversing the graph

whyrusleeping commented 7 years ago

Also, you can use the 'raw' type for data which doesnt need to be a dag-cbor formatted object.

ianopolous commented 7 years ago

The files are encrypted, the links are in the clear obviously.

whyrusleeping commented 7 years ago

Ah, okay. What does the json representation of your cbor structure look like?

ianopolous commented 7 years ago

My understanding from reading all the docs and code I could get my hands on was that I could write an arbitrary cbor object (using block put) which may contain special cbor tagged merkle links, and that recursive pinning will work based on this.

ianopolous commented 7 years ago

We never use JSON because it is terrible for binary data. Given a JSON that supports byte[] then any cbor structure that restricts map keys to strings is trivially mappable to JSON + byte[].

whyrusleeping commented 7 years ago

@ianopolous No, its not arbitrary CBOR. Its ipld objects, that are CBOR encoded, see: https://github.com/ipld/specs/tree/master/ipld

Any cbor object can be represented as json though, i'm just trying to get an idea of the structure you have visually.

whyrusleeping commented 7 years ago

Note theres still some weirdness in that spec document, but ipld objects are always maps (as far as my understanding goes)

ianopolous commented 7 years ago

This is the cbor for non leaf nodes (a cbor list): https://github.com/Peergos/Peergos/blob/simpler_http/src/peergos/shared/merklebtree/TreeNode.java#L399 The leaf nodes are just cbor byte[]

whyrusleeping commented 7 years ago

For your leaf nodes you can use format type raw instead of cbor. (ipfs block put --format=raw)

ianopolous commented 7 years ago

ok. The raw thing is a work around for leaf nodes. It does seem like a needless distinction and complication though, and doesn't solve the case of other nodes that have links.

whyrusleeping commented 7 years ago

@ianopolous The second longer object in the initial issue doesnt seem to parse as cbor (i'm trying to take a look at it with a cbor tool i wrote as well as https://www.npmjs.com/package/cbor-pretty-print). Are you sure its valid?

ianopolous commented 7 years ago

the second object is a tagged byte[] the tag being 258 the byte[] is a multiaddress of a hash

whyrusleeping commented 7 years ago

Ah, okay. Got it working, investigating more now...

whyrusleeping commented 7 years ago

@ianopolous what is the 'wrapped object' thing supposed to be? Just raw data?

ianopolous commented 7 years ago

N.B. This was just a minimal test case I could come up with which had a byte[] leaf and a merkle link to it. As linked above, our non leaf nodes are cbor lists of cbor maps (some of the map values are cbor merkle links)

ianopolous commented 7 years ago

the byte[] after the tag is a multiaddr to bytes (of the form /ipfs/$multihash)

ianopolous commented 7 years ago

I believe that is following the spec for merkle links

whyrusleeping commented 7 years ago

I'm now trying to figure out why the cbor lib does not think a cbor tag is not okay being a map...

ianopolous commented 7 years ago

I believe in cbor any object can be tagged.

whyrusleeping commented 7 years ago

@ianopolous what paths are you expecting to be able to traverse over the object above? (the one with the tag and 'wrapped data')

ianopolous commented 7 years ago

I don't expect to traverse any paths (I agree that only makes sense for a map with string keys), just to be able to pin the root recursively.

ianopolous commented 7 years ago

An interesting side point is that with our threat model we can't use ipld paths that are handled on the server as they are trivially MITMable.

whyrusleeping commented 7 years ago

@ianopolous we can only pin things that have paths traversable through them in some way. What are the link names?

ianopolous commented 7 years ago

I assumed you just extracted the merkle links from the cbor, which is trivial, and that then gives you a list to recurse over for pinning?

whyrusleeping commented 7 years ago

Eh... kinda. The thing is that the whole point of IPLD is paths.

I did fix the parsing issues, does this look about right?

whyrusleeping@aredhel ~/c/cborfun> cat foo.cbor | ipfs block put --format=cbor
zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
whyrusleeping@aredhel ~/c/cborfun> cat bar.cbor | ipfs block put --format=cbor
zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe
whyrusleeping@aredhel ~/c/cborfun> ipfs dag get zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe
"ZydkYXkgSVBGUyE="
whyrusleeping@aredhel ~/c/cborfun> ipfs dag get zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
{"Tag":258,"WrappedObject":"pQMiEiBlllD8NEPJFkKASO/FukVY3IY1lJgKWfXLPE2Ehn5tMQ=="}
ianopolous commented 7 years ago

The first object is just the byte[] of "gday IPFS!" but that is plausibly the same as the encoded thing you've got.

whyrusleeping commented 7 years ago

Yeah, since its a byte array the JSON marshaller for output here encodes it in base64. It should be correct

ianopolous commented 7 years ago

For me, pinning is much simpler than ipld and doesn't need to depend on ipld paths at all given your cbor format for merkle links.

whyrusleeping commented 7 years ago

Given my fixes here it should work now. The other thing that i see as being wrong is that your merkle link format isnt correct. It needs to look like:

{"/":"QmFooBar"}
ianopolous commented 7 years ago

My reading of the ipld doc (https://github.com/ipld/specs/tree/master/ipld#serialised-cbor-with-tags) is that that map is only the JSON version of a link. It states that it is encoded in cbor as a tag to either the byte[] of the multiaddr, or the string of the multiaddress.

whyrusleeping commented 7 years ago

@ianopolous That document does say that, and this is news to me... I'm gonna have to have a chat with @jbenet. This might be starting to make a little more sense now, I initially misread that portion of the spec and assumed it was outdated.

ianopolous commented 7 years ago

@whyrusleeping Excellent. :-)

whyrusleeping commented 7 years ago

@ianopolous could you create an object for me that uses the cid byte format? I think i've got the rest of this working, but the link appears to be incorrectly formatted (cid.Parse fails on it)