Blocks are only sanity checked when debug mode is active

BrendanBenshoof commented 9 years ago

https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27

This should be checked even if debug is not enabled. Otherwise malicious blocks and merkledag structures could be loaded.

jbenet commented 9 years ago

I tend to agree.

we moved this to speed things up, making the point that if we cant trust our own local repo we're in trouble already. But i think i'm somewhere in the middle-- definitely know cases where the tradeoff of having a node operate as fast as possible making assumptions is fine...

Then again, I do think that:

ipfs cat <hash>

should never return data that does not hash to <hash>, which an adversary can do by manipulating the local block cache. (hard decision/tradeoff. prob app dependent.)

BrendanBenshoof commented 9 years ago

Now that I have really dug into what is happening, I agree it is less of an issue than what I have made it out to be.

(please correct me if I have the following wrong)

Bitswap advertises a desire for a hash other nodes send blocks without identifiers, bitswap saves them without any sanity checking using their hash as their name (computed locally). the ipfs daemon (not sure which part) notices the block in the filesystem by name and stages and outputs it to the user. The sanity check I references happens when the block is loaded from the filesystem and staged. So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model (which I have not seen spelled out and I might be able to help with that)

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

jbenet commented 9 years ago

bitswap saves them without any sanity checking using their hash as their name

that's incorrect (or should be). all blocks MUST be checked before saving.

So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

right, this is the case (though this contradicts the statement above)

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model

no-- actually, some nodes will have to do this. but not all. ipfs is not a domain specific protocol, it's a very general thing where different use cases demand sitting at different points of the security x perf tradeoff.

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

right.

BrendanBenshoof commented 9 years ago

So here is the code that does the "sanity check" for bitswap. Essentially it takes in the given blocks, hashes them and looks if those hashes are on the wantlist and passes them on. It seems to discard the unwanted blocks (is this desired behavior? do we want people to be able to broadcast arbitrary blocks to proliferate them?) https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/exchange/bitswap/decision/engine.go#L202

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

Wrapping up: A fix to this issue would include:

adding a configuration variable to track if blocks pulled from disk should be checked.
Assigning a value to this variable using the IPFS configuration file.
Check this variable in addition to the debug variable in the if statement at line: https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27

whyrusleeping commented 9 years ago

@BrendanBenshoof throwing away blocks that we dont want is desired behaviour. You should only have blocks on your system if they are explicitly requested.

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

jbenet commented 9 years ago

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

yeah though the model should be made in relation to where we're headed, not where we are today. (of course need to fix impl - model divergence)

adding a configuration variable to track if blocks pulled from disk should be checked.

yep.

@whyrusleeping

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

not in all threat models. we want some nodes to be able to validate blocks they pull in from their repos. not all repos are pulled from local disk, and not all local disks are the same. process + disk are very different security arenas, and one may be at much higher risk than the other, particularly with networked filesystems.

whyrusleeping commented 8 years ago

Blocks received from the network are always checked. Blocks read from disk are not checked by default, but if you set ipfs config Datastore.HashOnRead true they will be.

jbenet commented 8 years ago

@whyrusleeping is that captured in documentation anywhere? it should also be captured in some security notes. Poisoning or corrupting repos could be an attack vector.

jbenet commented 8 years ago

I'd also err on the side of turning it on by default and letting people turn it off for speed. there could be a list of "knobs to turn for perf" document.

cc @RichardLitt as he handled various documentation things for go-ipfs

RichardLitt commented 8 years ago

Let's reopen this and assign it to me as a thing to add to the security docs.

Kubuxu commented 8 years ago

I disagree with rehashing on read being by default on, there is a big con for it that in my opinion outweigh possible benefits: increased read cost: in CPU time, latency and power usage

Possible pros are: disk error detection and attack prevention, but in both of those cases IPFS isn't primary risk factor.

When disk is dying so badly that it reads bad blocks user would notice from other instabilities, he for sure won't read IPFS's console logs because those users notice performance loss and change the disk long time ago. In case of using IPFS as an attack vector: user is screwed up in so many other ways when malware or 3rd person has access to user's file system.

For contrast: git by default doesn't even do object hashing while fetching from the network.

kevina commented 8 years ago

I agree with @Kubuxu on this one.

BrendanBenshoof commented 8 years ago

Reasoning A: Like a lot of security measures, this is not actually about protecting the user, it is about informing them. Jakub is right on all counts, but if IPFS can detect an issue still should inform the user it thinks something is wrong.

Reasoning B: Part of the core idea of IPFS is abstraction of data sources. Why should what appears to be the File System be more trusted than the network? It could be a network shared drive, or some other abstraction (like fuse) that is just as vulnerable as the network but masquerades as the file system.

On Fri, Aug 26, 2016, 12:13 Kevin Atkinson notifications@github.com wrote:

I agree with @Kubuxu https://github.com/Kubuxu on this one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/go-ipfs/issues/1152#issuecomment-242825889, or mute the thread https://github.com/notifications/unsubscribe-auth/ACrcFdKbjq8Id_DLoP8TAgIsAfr-NGn-ks5qjzrRgaJpZM4EKO5q .

jbenet commented 8 years ago

Well put, @BrendanBenshoof.

I'm on Reasoning B all the way through. If the user wants to decrease security to increase performance, that's their prerogative and we can give them a knob to turn. But we should not put the user in a weaker spot from the beginning.

The file system is definitely an attack vector. And it cannot be trusted as much as main memory, or the cpu registers. As @BrendanBenshoof, the filesystem may be coming from the network as well, or may not be under our full control. This will only get to be the case more as we add support for independent auth agents (so your keys can be stored more securely), or "storing" blocks in other parts of the filesystem (the source files themselves, as @kevina is writing).

We have guides and readmes, and config docs. between:

(a) a line saying "turn this on for more security, but less perf"
(b) a line saying "turn this off for more perf, but less security"

I much prefer (b).

RichardLitt commented 6 years ago

I am unlikely to get to this soon. I would suggest unassigning this and labeling at as 'help wanted'. Note: posting this may automatically remove my assignment.

whyrusleeping commented 6 years ago

Haha, thanks @RichardLitt

kevina commented 6 years ago

I need to investigate this. Thus bug is two years old and I think some things might have changed.

Kubuxu commented 6 years ago

It is still the same, HashOnRead was introduced a while ago and it was mentioned in this discussion.

olizilla commented 5 years ago

HashOnRead is available, but defaults to false. Added it to the docs https://github.com/ipfs/go-ipfs/pull/6401

We should evaluate the preformance hit, and if it's not crushing, enable HashOnRead by default.

Stebalien commented 5 years ago

Hashing tends to be pretty expensive but we'd have to do some profiling to be sure. Ideally, we'd be at a level of performance where re-hashing would kill performance but I'm pretty sure we aren't there yet.

Note: corrupting a local datastore isn't really a reasonable attack vector. The local disk should be more trusted than the network because we already trust that the go-ipfs binary isn't corrupted, the user's bashrc doesn't start any malware, etc.

On the other hand, corrupting S3, etc., could be an issue. I wonder if we should flag different datastores as trusted/untrusted? I'm worried this adds too much complexity.

Kubuxu commented 5 years ago

I've evaluated the performance hit a long time ago. It was high enough that enabling it by default wasn't really an option but it is a good idea to reevaluate.

Stebalien commented 5 years ago

IMO, we should re-evaluate and, if it doesn't have a performance impact, figure out why and fix the performance issue.

Kubuxu commented 5 years ago

The performance impact was just the hashing itself. Native sha256 on my laptop on a single core maxes out 300MiB/s, so reads will have proportional overhead to that.

If someone reads 100MiB/s then it will be 33% CPU spent on hashing.

Jorropo commented 1 year ago

This is ancient, modern releases of go can hash at ~2GiB/s per core on all modern amd64 cores (thx to new shani accelerated crypto), we also used a third party sha256impl for a while which did this already. With filesystems that are able to sanity checks drives such as ZFS and BTRFS I don't belive there is lots of value with something like hashOnRead being on by default, having it at the FS level allows for better caching and avoiding to rehash things in the block cache of the OS. I'll close this in the future unless someone steps up.

ipfs / kubo

Blocks are only sanity checked when debug mode is active #1152