Closed kevincox closed 4 years ago
hey! Objects are the base unit of transmission in IPFS so it is rightly limited! Otherwise you may have to receive full blocks before you can verify that the data is legit, which is a DDoS attach vector.
Note that the ipfs add command happily creates larger directories so this shouldn't be an issue for the network.
It is an issue. Directories that are too big are automatically sharded using a HAMT into small-enough blocks. Large objects are chunked into 256KB chunks.
@kevincox note that @hsanjuan said
Directories that are too big are automatically sharded using a HAMT into small-enough blocks
Anything ipfs ingests is internally converted to blocks way smaller than 2MiB ( there is a secondary hard limit at 1MiB in parts of the stack ). 2MiB is indeed the fundamental hard limit across anything the network will move, by design.
https://github.com/ipfs/go-ipfs/issues/5157#issuecomment-400772240 is a more-in-depth discussion of how large directories are represented.
2MiB is indeed the fundamental hard limit across anything the network will move, by design.
Unless I am doing something wrong this doesn't appear to be true because the links I provided are fully functional.
Also if the hard limit is fixed we still need a better error message than "Unexpected EOF" for JSON and even worse for protocol buffers where a truncation could cause a valid (but incorrect) object.
a truncation could cause a valid (but incorrect) object.
Those objects will not match verification when their hash is calculated and matched against what was requested.
I can't find providers for the hashes you post as examples
@kevincox both your examples are under 2MiB:
admin:~/devel/go-ipfs$ curl -s https://gateway.ipfs.io/api/v0/block/get/QmTEzofvZ52JaQWcJkoUSRcHzkerGGUqrqD3tr5YyHLhXg | wc -c
1546679
admin:~/devel/go-ipfs$ curl -s https://gateway.ipfs.io/api/v0/block/get/QmRaRnocv153effAKhbt2oQU8zoiQt7nBCMuDzJ4r5BuU6 | wc -c
1311236
Hmm, I was using object/get
and it is notable larger.
% curl -s https://gateway.ipfs.io/api/v0/object/get/QmTEzofvZ52JaQWcJkoUSRcHzkerGGUqrqD3tr5YyHLhXg | wc -c
2211427
Also my original comment had an error. It is actually 1MiB. Either way this file is larger than both.
@hsanjuan Those objects will not match verification when their hash is calculated and matched against what was requested.
No, because the API call will return the hash of part of the object and the user will assume that the hash is for the whole request body.
@kevincox I think we are getting off topic somewhat. In order for integrity verifications to work there must be a global limit, and it needs to be relatively small ( 2MiB hard, 1MiB soft ). If this is getting in your way: your specific use-case needs to be evaluated to suggest the proper type of chunking/sharding strategy.
In any case this type of discussion is better done over at the https://discuss.ipfs.io/ forums.
I understand there must be a limit but I think there are two issues here:
At least 2 needs to be fixed. And while doing that it would be nice to get 1 to match the protocol.
Note: If you want large directories, enable directory sharding: https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md (which should probably just be stabilized at this point).
But you're right. We definitely shouldn't truncate, that's a bug.
In general, the ipfs block/dag/object
commands shouldn't care about the size of the object (under a reasonable limit). The real restriction is in bitswap.
@ribasushi could you fix the object truncation issue? We're truncating in core/coreapi/object.go
using a LimitReader
. Instead, we should return an error when we hit the max.
@kevincox regarding object/block not matching: object
returns json-formatted protocol data by default while block
returns you the actual wire-bytes. Hence the differenc
Regarding the truncation issue, just to be sure I am looking at the right thing: can you c/p the exact command/HTTP call you are using when get the silent truncation?
I've filed four issues: #7019, #7020, #7021, #7022.
I don't have an example. But protobufs don't detect truncation. So if you ran the following command, and it managed to cut the proto between fields (most likely between two links, or after the links but before the data field) then no error would be detected.
curl -vF file=@/tmp/myproto.pb "http://localhost:5001/api/v0/object/put?pin=true&inputenc=protobuf"
Right now the
object/put
API call andobject put
CLI are limited to 1MiB making them unsuitable for a variety of operations. It is incredibly common to have larger directories and other objects so these limits should be raised.The current limit allows for creating directories with ~10k entires. There are many uses for larger directories.
I think one concern is that the ful object is buffered in memory during the API call. Ideally this buffer would be avoided but even for now the limit could be much higher.
Note that the
ipfs add
command happily creates larger directories so this shouldn't be an issue for the network.