Open BrendanBenshoof opened 9 years ago
I tend to agree.
we moved this to speed things up, making the point that if we cant trust our own local repo we're in trouble already. But i think i'm somewhere in the middle-- definitely know cases where the tradeoff of having a node operate as fast as possible making assumptions is fine...
Then again, I do think that:
ipfs cat <hash>
should never return data that does not hash to <hash>
, which an adversary can do by manipulating the local block cache. (hard decision/tradeoff. prob app dependent.)
Now that I have really dug into what is happening, I agree it is less of an issue than what I have made it out to be.
(please correct me if I have the following wrong)
Bitswap advertises a desire for a hash other nodes send blocks without identifiers, bitswap saves them without any sanity checking using their hash as their name (computed locally). the ipfs daemon (not sure which part) notices the block in the filesystem by name and stages and outputs it to the user. The sanity check I references happens when the block is loaded from the filesystem and staged. So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.
There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model (which I have not seen spelled out and I might be able to help with that)
If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)
bitswap saves them without any sanity checking using their hash as their name
that's incorrect (or should be). all blocks MUST be checked before saving.
So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.
right, this is the case (though this contradicts the statement above)
There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model
no-- actually, some nodes will have to do this. but not all. ipfs is not a domain specific protocol, it's a very general thing where different use cases demand sitting at different points of the security x perf tradeoff.
If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)
right.
So here is the code that does the "sanity check" for bitswap. Essentially it takes in the given blocks, hashes them and looks if those hashes are on the wantlist and passes them on. It seems to discard the unwanted blocks (is this desired behavior? do we want people to be able to broadcast arbitrary blocks to proliferate them?) https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/exchange/bitswap/decision/engine.go#L202
I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.
Wrapping up: A fix to this issue would include:
@BrendanBenshoof throwing away blocks that we dont want is desired behaviour. You should only have blocks on your system if they are explicitly requested.
Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.
I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.
yeah though the model should be made in relation to where we're headed, not where we are today. (of course need to fix impl - model divergence)
- adding a configuration variable to track if blocks pulled from disk should be checked.
yep.
@whyrusleeping
Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.
not in all threat models. we want some nodes to be able to validate blocks they pull in from their repos. not all repos are pulled from local disk, and not all local disks are the same. process + disk are very different security arenas, and one may be at much higher risk than the other, particularly with networked filesystems.
Blocks received from the network are always checked. Blocks read from disk are not checked by default, but if you set ipfs config Datastore.HashOnRead true
they will be.
@whyrusleeping is that captured in documentation anywhere? it should also be captured in some security notes. Poisoning or corrupting repos could be an attack vector.
cc @RichardLitt as he handled various documentation things for go-ipfs
Let's reopen this and assign it to me as a thing to add to the security docs.
I disagree with rehashing on read being by default on, there is a big con for it that in my opinion outweigh possible benefits: increased read cost: in CPU time, latency and power usage
Possible pros are: disk error detection and attack prevention, but in both of those cases IPFS isn't primary risk factor.
When disk is dying so badly that it reads bad blocks user would notice from other instabilities, he for sure won't read IPFS's console logs because those users notice performance loss and change the disk long time ago. In case of using IPFS as an attack vector: user is screwed up in so many other ways when malware or 3rd person has access to user's file system.
For contrast: git by default doesn't even do object hashing while fetching from the network.
I agree with @Kubuxu on this one.
Reasoning A: Like a lot of security measures, this is not actually about protecting the user, it is about informing them. Jakub is right on all counts, but if IPFS can detect an issue still should inform the user it thinks something is wrong.
Reasoning B: Part of the core idea of IPFS is abstraction of data sources. Why should what appears to be the File System be more trusted than the network? It could be a network shared drive, or some other abstraction (like fuse) that is just as vulnerable as the network but masquerades as the file system.
On Fri, Aug 26, 2016, 12:13 Kevin Atkinson notifications@github.com wrote:
I agree with @Kubuxu https://github.com/Kubuxu on this one.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/go-ipfs/issues/1152#issuecomment-242825889, or mute the thread https://github.com/notifications/unsubscribe-auth/ACrcFdKbjq8Id_DLoP8TAgIsAfr-NGn-ks5qjzrRgaJpZM4EKO5q .
Well put, @BrendanBenshoof.
I'm on Reasoning B all the way through. If the user wants to decrease security to increase performance, that's their prerogative and we can give them a knob to turn. But we should not put the user in a weaker spot from the beginning.
The file system is definitely an attack vector. And it cannot be trusted as much as main memory, or the cpu registers. As @BrendanBenshoof, the filesystem may be coming from the network as well, or may not be under our full control. This will only get to be the case more as we add support for independent auth agents (so your keys can be stored more securely), or "storing" blocks in other parts of the filesystem (the source files themselves, as @kevina is writing).
We have guides and readmes, and config docs. between:
I much prefer (b).
I am unlikely to get to this soon. I would suggest unassigning this and labeling at as 'help wanted'. Note: posting this may automatically remove my assignment.
Haha, thanks @RichardLitt
I need to investigate this. Thus bug is two years old and I think some things might have changed.
It is still the same, HashOnRead
was introduced a while ago and it was mentioned in this discussion.
HashOnRead
is available, but defaults to false. Added it to the docs https://github.com/ipfs/go-ipfs/pull/6401
We should evaluate the preformance hit, and if it's not crushing, enable HashOnRead
by default.
Hashing tends to be pretty expensive but we'd have to do some profiling to be sure. Ideally, we'd be at a level of performance where re-hashing would kill performance but I'm pretty sure we aren't there yet.
Note: corrupting a local datastore isn't really a reasonable attack vector. The local disk should be more trusted than the network because we already trust that the go-ipfs binary isn't corrupted, the user's bashrc doesn't start any malware, etc.
On the other hand, corrupting S3, etc., could be an issue. I wonder if we should flag different datastores as trusted/untrusted? I'm worried this adds too much complexity.
I've evaluated the performance hit a long time ago. It was high enough that enabling it by default wasn't really an option but it is a good idea to reevaluate.
IMO, we should re-evaluate and, if it doesn't have a performance impact, figure out why and fix the performance issue.
The performance impact was just the hashing itself. Native sha256
on my laptop on a single core maxes out 300MiB/s, so reads will have proportional overhead to that.
If someone reads 100MiB/s then it will be 33% CPU spent on hashing.
This is ancient, modern releases of go can hash at ~2GiB/s per core on all modern amd64 cores (thx to new shani accelerated crypto), we also used a third party sha256impl for a while which did this already. With filesystems that are able to sanity checks drives such as ZFS and BTRFS I don't belive there is lots of value with something like hashOnRead being on by default, having it at the FS level allows for better caching and avoiding to rehash things in the block cache of the OS. I'll close this in the future unless someone steps up.
https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27
This should be checked even if debug is not enabled. Otherwise malicious blocks and merkledag structures could be loaded.