ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Object/put truncates objects without error #7018

Closed kevincox closed 4 years ago

kevincox commented 4 years ago

Right now the object/put API call and object put CLI are limited to 1MiB making them unsuitable for a variety of operations. It is incredibly common to have larger directories and other objects so these limits should be raised.

The current limit allows for creating directories with ~10k entires. There are many uses for larger directories.

I think one concern is that the ful object is buffered in memory during the API call. Ideally this buffer would be avoided but even for now the limit could be much higher.

Note that the ipfs add command happily creates larger directories so this shouldn't be an issue for the network.

hsanjuan commented 4 years ago

hey! Objects are the base unit of transmission in IPFS so it is rightly limited! Otherwise you may have to receive full blocks before you can verify that the data is legit, which is a DDoS attach vector.

Note that the ipfs add command happily creates larger directories so this shouldn't be an issue for the network.

It is an issue. Directories that are too big are automatically sharded using a HAMT into small-enough blocks. Large objects are chunked into 256KB chunks.

kevincox commented 4 years ago

Note that this isn't a protocol limit, it's just a client limit.

For example using the MFS example or ipfs add example I easily created much larger directories.

ribasushi commented 4 years ago

@kevincox note that @hsanjuan said

Directories that are too big are automatically sharded using a HAMT into small-enough blocks

Anything ipfs ingests is internally converted to blocks way smaller than 2MiB ( there is a secondary hard limit at 1MiB in parts of the stack ). 2MiB is indeed the fundamental hard limit across anything the network will move, by design.

https://github.com/ipfs/go-ipfs/issues/5157#issuecomment-400772240 is a more-in-depth discussion of how large directories are represented.

kevincox commented 4 years ago

2MiB is indeed the fundamental hard limit across anything the network will move, by design.

Unless I am doing something wrong this doesn't appear to be true because the links I provided are fully functional.

kevincox commented 4 years ago

Also if the hard limit is fixed we still need a better error message than "Unexpected EOF" for JSON and even worse for protocol buffers where a truncation could cause a valid (but incorrect) object.

hsanjuan commented 4 years ago

a truncation could cause a valid (but incorrect) object.

Those objects will not match verification when their hash is calculated and matched against what was requested.

I can't find providers for the hashes you post as examples

ribasushi commented 4 years ago

@kevincox both your examples are under 2MiB:

admin:~/devel/go-ipfs$ curl -s https://gateway.ipfs.io/api/v0/block/get/QmTEzofvZ52JaQWcJkoUSRcHzkerGGUqrqD3tr5YyHLhXg | wc -c
 1546679
admin:~/devel/go-ipfs$ curl -s https://gateway.ipfs.io/api/v0/block/get/QmRaRnocv153effAKhbt2oQU8zoiQt7nBCMuDzJ4r5BuU6 | wc -c
 1311236
kevincox commented 4 years ago

Hmm, I was using object/get and it is notable larger.

% curl -s https://gateway.ipfs.io/api/v0/object/get/QmTEzofvZ52JaQWcJkoUSRcHzkerGGUqrqD3tr5YyHLhXg | wc -c
2211427

Also my original comment had an error. It is actually 1MiB. Either way this file is larger than both.

kevincox commented 4 years ago

@hsanjuan Those objects will not match verification when their hash is calculated and matched against what was requested.

No, because the API call will return the hash of part of the object and the user will assume that the hash is for the whole request body.

ribasushi commented 4 years ago

@kevincox I think we are getting off topic somewhat. In order for integrity verifications to work there must be a global limit, and it needs to be relatively small ( 2MiB hard, 1MiB soft ). If this is getting in your way: your specific use-case needs to be evaluated to suggest the proper type of chunking/sharding strategy.

In any case this type of discussion is better done over at the https://discuss.ipfs.io/ forums.

kevincox commented 4 years ago

I understand there must be a limit but I think there are two issues here:

  1. The object/put appears to count the limit differently than the IPFS protocol.
  2. There is a truncation bug in the object/put endpoint.

At least 2 needs to be fixed. And while doing that it would be nice to get 1 to match the protocol.

Stebalien commented 4 years ago

Note: If you want large directories, enable directory sharding: https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md (which should probably just be stabilized at this point).

But you're right. We definitely shouldn't truncate, that's a bug.

In general, the ipfs block/dag/object commands shouldn't care about the size of the object (under a reasonable limit). The real restriction is in bitswap.

Stebalien commented 4 years ago

@ribasushi could you fix the object truncation issue? We're truncating in core/coreapi/object.go using a LimitReader. Instead, we should return an error when we hit the max.

ribasushi commented 4 years ago

@kevincox regarding object/block not matching: object returns json-formatted protocol data by default while block returns you the actual wire-bytes. Hence the differenc

Regarding the truncation issue, just to be sure I am looking at the right thing: can you c/p the exact command/HTTP call you are using when get the silent truncation?

Stebalien commented 4 years ago

I've filed four issues: #7019, #7020, #7021, #7022.

kevincox commented 4 years ago

I don't have an example. But protobufs don't detect truncation. So if you ran the following command, and it managed to cut the proto between fields (most likely between two links, or after the links but before the data field) then no error would be detected.

curl -vF file=@/tmp/myproto.pb "http://localhost:5001/api/v0/object/put?pin=true&inputenc=protobuf"