kfogel / OneTime

An open source encryption program that uses the "one-time pad" method.
http://red-bean.com/onetime
32 stars 14 forks source link

Security concerns #17

Open maqp opened 7 years ago

maqp commented 7 years ago
# In other words, the real message (and its digest, described
# below) sits somewhere along a slider surrounded by fuzz on
# each side, and the precise amount of fuzz on each side is
# known only to those with the pad.  This prevents a
# known-plaintext message substitution attack: because
# attackers cannot know where the message is, they cannot
# reliably replace a known plaintext with some other
# plaintext of the same length, even with channel control.

If the fuzzing length is the only thing providing KPA-security, then attacker's advantage for existential forgery is 1/512. This is unacceptable. A proper way to do this is

ciphertext = (plaintext message + padding) XOR pad1  # padding to nearest 255 bytes.
MAC = HMAC-SHA256(pad2, ciphertext)  # pad2 is at least 256 bits.
packet = ciphertext + MAC

To auth and decrypt packet:

MAC' = HMAC-SHA256(pad2, ciphertext)
if MAC == MAC':
    decrypt(ciphertext, pad1)

Note that if sender of message can learn when MAC fails, you need a constant time function for MAC comparison.

The HMAC construction is needed due to length extenstion attacks. You can also use something like simplesha3 where it's enough you concatenate MAC-key with ciphertext. Preferably move to unconditionally secure authentication (one-time MAC or Carter-Wegman MAC). I have implemented one-time-MAC on Python, feel free to use it as reference. Regardless of which standard MAC you choose, you're able to remove the fuzzing and greatly simplify the code.

How to Generate a One-Time Pad
$ dd if=/dev/random of=~/.onetime/to_foo.pad bs=1000 count=10000

While /dev/random reduces entropy_avail counter, it doesn't guarantee 8 bits of entropy is mixed into entropy pool for every output byte fed into pad. Internal compression with computationally secure hash function doesn't make input bytes information theoretically secure either. /dev/random is not a hardware random number generator, TRNG or Bernoulli process. The general recommendation is not to use Kernel CSPRNG output as keystream on itself, but to feed hash of it as key to a CSPRF such as AES256, SHA256, XSalsa20 that then produces the keystream.

https://github.com/kfogel/OneTime/blob/master/onetime#L600 has

ret_data[i] = chr(random.randint(0, 255))

The comments say

# We could use a random.SystemRandom object
# here if one is available, but the Python documentation says
# that class is not supported on all systems, while not saying
# what error is raised if it's not supported. 

If secure alternative is not available, fallback to RNG that is not intended for cryptographic purposes is a bad idea. Consider something like

except NotImplementedError:
    print("Error: No Kernel CSPRNG available. Exiting for your safety.")
    exit()
kfogel commented 7 years ago

Thanks for the detailed comments, @maqp . I'm in the middle of some big deadlines right now (OneTime is not my day job, as you can probably guess :-) ), so I won't be able to give this the full study it deserves until a few weeks from now, but I will do so as soon as I can.

I did take a quick look now, though. I think the comment in the code may be a little out of date. It's true that the attacker's advantage for existential forgery is 1/512 in terms of message position, but now that there is pad-based message authentication as well (which is somewhat new in the code), the message position is not the principal authentication anyway. The fuzz length is still useful because it disguises the precise message length, of course.

I'd been thinking about XOR-adding a random stream (whose seed is based on pad data) to the pad+message too, but am reluctant to add any complexity unless I'm sure it really adds security. I guess, though, that the question of whether that really adds security depends on the strength of the pad, which is beyond OneTime's control, so perhaps it should do as much as it can do just in case the pad is weak.

Re falling back to non-cryptographic RNG: the question of whether it is a bad idea depends on what it's being used for. In this circumstance, I didn't think it makes much of a difference, but I'll look again more carefully when I have more brain cycles available.

Re generating the pad data: Hunh. I sought all sorts of advice, since I'm not really competent to judge the kernel's /dev/random myself, and now I learn that even that's not enough :-). If it's an easy patch, do you want to have a go at implementing a pad-generation option?

Sorry for the short and fairly imprecise reply. I can only really work on OneTime when I get long stretches of continuous time available (at least 3 or 4 hours at a chunk), and that hasn't really happened since our company started taking off. Nice problems to have, but still... sigh.

Thanks for the report & suggestions.

-Karl

maqp commented 7 years ago

The Pad-based message authentication doesn't sound like a standard method so you should definitely write up-to-date documentation so it can be reviewed further. Until you feel confident about the security design, I would definitely add a disclaimer about experimental state. This isn't meant to deter users, but to provide accurate information that helps them assess their threat model.

As for the fuzzing, you really do not need to prepend padding: For OTP each byte in pad provides information theoretical security to each plaintext byte. If each pad byte is truly independent, then attacker can't distinguish the offset of plaintext. However, remember Shannon's maximum: the enemy knows the system. They can deduce the tool used from headers and configure attack accordingly. If pad can be broken with computational effort (e.g. if CSPRNG like /dev/random is used), offset and padding do not help.

I am not a cryptographer, but everything I've studied points to trailing padding being enough to obfuscate plaintext length. Pad material is relatively expensive to produce so optimizing is a good idea.

the question of whether [mixing pseudo random stream] really adds security depends on the strength of the pad, which is beyond OneTime's control

To generate one-time pads the process must be 1. truly random, that is, output of bits are not produced by an algorithm based on small, truly randomish seed (as in the case of /dev/random) and 2. bits are independent of one another (usually small amount of short-term auto-correlation exists, AFAIK it means "if output is 10110101 then there is 51% possibility of next bit being 1). Even if you have truly random independent bits, the result might be biased, e.g. remove all spades from deck, shuffle them and the probability of getting a red card is 2/3.

The best way we currently know how to produce pads is using a hardware random number generator or HWRNG for short (you can get USB-devices for this purpose) that measures a chaotic (non-algorithmic) process to produce bits. The auto-correlation can be reduced significantly by keeping sampling time as long as possible. Last, to eliminate simple bias you can use Von Neumann whitening.

OTP assumes pads are perfect. Since complex forms of bias and auto-correlation are insanely hard to remove, entropy from HWRNG is practically never exactly 8 bits per byte (think 8 perfect coin tosses to determine byte value). The best way to process the pad before using it is to use a CSPRF like SHA256 to compress the bits. That way the output bits will have effectively no bias. This must however be done without reusing entropy (as is the case with /dev/(u)random). So say you use Von Neumann whitening on output bits of HWRNG and Ent evaluates it to contain 7.5 bits per byte of entropy. What you'll want to do is for each output 256-bit hash, input 256 * (8 / 7.5) = 274 bits = 35 bytes (round upwards) into SHA256. So what about XORing bits from CSRPNG?

Re falling back to non-cryptographic RNG: the question of whether it is a bad idea depends on what it's being used for.

There's no harm in doing that, but personally I would rather feed 256 bits from /dev/urandom together with the entropy from pad into the SHA256 function itself, than to XOR the compressed pad with CSRPNG. The reason Python's random.Random is a bad idea is because the adversary has to spend trivial amount of computational energy to rule out the possible key streams that can be mixed in; If I understood correctly, you wanted to use it as fail-safe in case pads are insecure to begin with. Fail-safe is better than no fail-safe, but in cases if /dev/urandom is not a possibility, it's better to gracefully exit with security warning than to allude users into false sense of security.

If it's an easy patch, do you want to have a go at implementing a pad-generation option?

Unfortunately there's no easy patch. You can look at my TFC project for sampling implementation (in Python) of free hardware design HWRNG on Raspberry Pi. You can find the HWRNG schematics here.

If you're not up to implementing HWRNG support, I recommend you use modern symmetric algorithms instead; PyNaCl is a great library.

EDIT: Edited a bunch of times, I hope it didn't fill your inbox.

kfogel commented 7 years ago

Hi, @maqp. Do the comment fixes I've committed recently help clarify what the code is doing? (Best to look at the source directly, not at the diffs of the commits listed above, because I've updated those comments even more since then, in commits that were not really about this issue so they are not marked with this issue number.)

To clarify what's going on:

The purpose of the head fuzz and tail fuzz is to disguise both position and length of the plaintext. Disguising the length is more important of the two: otherwise, since the format is known, the length of the plaintext (i.e., of the compressed plaintext) would be known too. That would be bad, obviously.

But it's good to disguise the position too, on the principle that in a non-KPA situation, as few bytes of actual pad data should be revealed as possible. Since the bzip encoding of the plaintext starts with a known header, if someone knew what position that header were at, they would know a few bytes of pad. In theory that shouldn't be of any use to an attacker, but if OneTime can avoid revealing those bytes, it should do so.

However, the fuzzing length does not provide KPA-security and is not intended to. Instead, the internal digest provides message integrity, so that no plaintext substitution is possible.

Regarding the data coming from RandomEnough: it is not in any way a "fail safe", and it is never used for encryption of plaintext under any circumstances. It's only there to avoid using raw pad data for head fuzz and tail fuzz, because if there were anything even slightly weak about the pad, that would be somewhat easier to detect from raw pad data as opposed to pad data + unrelated [pseudo]random noise.

Pad weakness is not an all-or-nothing situation. A pad can be very weak, or only slightly weak, or, ideally, perfectly random. In the "very weak" case, there is nothing we can do to help. In the "only slightly weak" case, however, OneTime can help a bit by disguising the weakness.

Of course I completely agree with you that a pad should be perfectly random, and I still need to update the instructions (in the program and on the site) to discuss the possible unsuitability of /dev/random. I'm in an extremely network-scarce situation right now (even just pushing recent commits and adding this comment took a while) so I won't be able to do that until I'm back on the real Internet in November.

kfogel commented 7 years ago

(Note I'm not able to update the web site at the moment, so the most recent version of OneTime should be fetched from GitHub.)

kfogel commented 7 years ago

As mentioned in issue #24 (in this comment), I'm considering getting rid of one of the fuzzes, because your argument makes sense to me.

Won't happen immediately, as my day job is pretty busy, but I'll try to take care of this soon and will comment back here when it's done.

aaannndddyyy commented 7 years ago

For cloaking the actual length of the message, tail fuzz is sufficient. But i do see some merit in the head fuzz, too. A 1/512 chance to guess the right position sounds very high. But you must not forget that it is not a key where you just quickly try 512 possible combinations and then you have the answer, and you know you have the answer, and you can send it without detection. In average, the attacker would have to send 255 corrupted messages first. And the receiver would notice that. Yes, it makes code a bit more complicated. But it is still readable. If readability is a concern, it might rather be refactored a bit. And whether you put your total fuzz only on one end, or you split it into two parts, does not negatively affect security. n bytes of tail fuzz are as good as m bytes Head fuzz and n-m bytes tail fuzz when it comes to hiding message length. Personally, I like the position obfuscation. It's an independent authentication method.

aaannndddyyy commented 7 years ago

The points from this issue here, as I see it, were:

maqp commented 7 years ago

"We make it impossible for the attacker to know exactly where the real encrypted text starts"

Not exactly. You make it ~512 times tougher because that's the maximum amount of effort the attacker has to do.

But then again, if there's a crib that breaks any byte from the pad, prepended padding does not help. Adversary simply starts by decrypting from position 256 and realizes they're at the center of a sentence. They proceed to decrypt every earlier byte until they reach capital letter or whatever makes sense.

"This means she can't derive the relevant part of the pad, so she can't substitute another plaintext, even if she knows the original plaintext"

This is better than using no authentication at all, perhaps in cases where OTP is done by hand. But we're talking about a different scenario.

When using a computer program the correct way to authenticate the ciphertext is to sign it using a message authentication code (MAC). You can use any size key to produce any size MAC: 128, 256, 512 bits. These MACs can be unconditionally secure (meaning most effective attack is guessing). The point is, 9-bit "MAC" with prepended padding is insecure by itself and needless logic when used with proper MACs.

"and if she doesn't know the original plaintext, she has no way to derive the length of the plaintext from the encrypted text."

For this the trailing padding is completely sufficient.

RE: RandomEnough:

There are a couple of places (the head fuzz and tail fuzz) where raw pad bytes would otherwise be exposed in the encrypted message, except that they're not exposed because they're XOR'd with bytes coming from this class. That's all this class is for.

I'm not sure why the raw bytes would otherwise be exposed in the encrypted message. The assumption is you're always encrypting some sort of sensitive information, whether it's the ord-value of prepended padding revealing it's length, a header that tells how long padding is used etc.

Since pad is truly random, the ciphertext is truly random too. It reveals no information to attacker. If the pad is not truly random and adversary can break it with super computers, XORing the bits with weak RNG doesn't add security. That's why you make the warning on line 571.

With OTP, each pad byte is independent. That means even if raw bytes would be revealed from padding (say you used null-bytes in padding), they do not compromise secrecy of plaintext. This holds true even if we're talking about a stream cipher.

RE: Checksum

A SHA256 digest, XOR'd with pad, of: the raw pad session hash input bytes, plus the raw head fuzz (that is, the random bytes from before they were XOR'd with the pad to produce the final head fuzz), plus the plaintext message.

I can't evaluate the security of this construction as it's hard to wrap one's head around it.

"raw pad session hash input bytes" is unclear.

raw head fuzz = final pre-pended padding XOR pad

If attacker can still use the original raw head fuzz for existential forgeries, (e.g. if the forged plaintext message can be of the same length than original), it doesn't help.

In the case of known plaintext, if the padding around the plaintext is known too, the attacker is able to produce the input of the SHA256 digest function and derive the raw digest. They can then XOR the digest with the ciphertext to obtain the key that was used to encrypt the hash, and use it to encrypt the hash of encrypted message.

To understand better what's going on here, please consider writing high-level description of the protocol using pseudo-code.

So again, don't use non-standard techniques. Use a keyed hash function. A simple implementation would be something like the following:


#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Dependencies:
# sudo apt install python-pip
# sudo pip install simplesha3

import simplesha3
import os

def sha3_256_sign(mac_key, ciphertext):
    mac = simplesha3.sha3256(mac_key + ciphertext)
    return ciphertext + mac

def sha3_256_mac_verify(mac_key, signed_ciphertext):

    purp_mac = signed_ciphertext[-32:]
    purp_ct = signed_ciphertext[:-32]

    calc_mac = simplesha3.sha3256(mac_key + purp_ct)

    # Constant time comparison    
    if len(purp_mac) != len(calc_mac):
        return False

    result = 0
    for x, y in zip(bytearray(calc_mac), bytearray(purp_mac)):
        result |= x ^ y
    return result == 0

# Simple use case

#################### Sender ######################

pt = "Attack at dawn"

# Pad message
assert len(pt) < 256
padded_pt = pt.ljust(255, chr(255 - len(pt)))

# Generate pad and key from /dev/urandom. Not information theoretically secure!
pad = os.urandom(len(padded_pt)) 
mac_k = os.urandom(32)

# XOR pad with padded pt
ct = ''.join(chr(ord(p) ^ ord(k)) for p, k in zip(padded_pt, pad))

# Encrypt-then-sign is the ideal way
signed_padded_msg = sha3_256_sign(mac_k, ct)

#################### Receiver ######################

# Contact already has mac_k and pad

if not sha3_256_mac_verify(mac_k, signed_padded_msg):
    # MAC fail handling logic
    print ("Message authentication failed.")
    exit()

# XOR pad with ct
ct = signed_padded_msg[:-32]
padded_pt = ''.join(chr(ord(p) ^ ord(k)) for p, k in zip(ct, pad))

#remove padding
pt = padded_pt[:-ord(padded_pt[-1:])]

print 'Message was: "%s"' % pt

You could of course use a second pad and XOR the MAC but I don't think that's needed. Not even post-quantum adversaries have the capability to break the MAC in real time / a matter of days, so computational security is enough. OTP is mainly needed for long term guarantee of message secrecy, so the generation of pad must rely on HWRNGs.


@aaannndddyyy

"In average, the attacker would have to send 255 corrupted messages first. And the receiver would notice that."

If there are thousands of users, the number of compromised ones is most likely not zero and that is unacceptable.

"Personally, I like the position obfuscation. It's an independent authentication method."

Like I said earlier, 9-bit entropy isn't exactly cryptographically secure on itself, and it isn't needed with e.g. 256-bit SHA3 MACs.

"But for preventing KPA a secure message digest is used."

The keying with a dedicated MAC key / section of one-time pad is not clear. It's unclear whether known plaintext would allow attacker to derive the preimage of hash, that would then compromise the keystream used to protect the hash.

"Encrypt-Then-Authenticate is better, but in the case of One-Time pads Authenticate then Encrypt is encouraged."

I'd like a source on this. The Cryptographic Doom Principle isn't excluded from stream ciphers, why would it be excluded from OTPs?

"True, but nothing except complexity speaks against splitting the fuzz part"

I find that to be enough. It's almost always a merit to stick to standard constructions, as they've been carefully vetted first. Even seemingly secure constructions fall under heavy criticism, e.g. see Telegram

If someone wants more than /dev/random, they should care for that themselves

If the generation of secure pads is left for the user, and the system defaults to using CSPRNGs, it doesn't use one-time-pads. The software should not be be called one-time but "Pre-shared-expanded-keystream-from-kernel-CSPRNG".

Using one-time-pad means there's a set of downsides to convenience. Advertising something more convenient (requires no HWRNG) does no good to this project. The only reason one-time-pad is used, is information theoretical security (aka perfect secrecy) and that property does not hold with /dev/(u)random.

If developers of one-time know this and continue to market with "one-time pad method", the infosec community will

  1. Determine the developer doesn't know their field (OTP != pre-expanded keystream)
  2. Label it as snake oil.

Advertising pre-shared keys is fine by me: current public key crypto will eventually run into problems.

But why would you pre-share terabytes of keystream when you can fit it all into 10kB file containing the cipher and a 256-bit key that generates the secure pad as you go? The security is exactly the same.

Moreover, since you've agreed on the cipher by selecting the tool, you only have to share the key. Again, the security is exactly the same (except that SHA-1 hash function used by /dev/(u)random is older and computationally less secure than modern XSalsa20, AES, SHA2/3 etc, so it could be even worse).

aaannndddyyy commented 7 years ago

@maqp I said, good random data is neccesary, and I recommended a HWRNG for this. I do not see how this script can be updated to automatically send the user such a device. At most it can interface to it. But then again, this would suppose a common standard.

What I deem more reasonable is advising the user about getting good randomness.

aaannndddyyy commented 7 years ago

You make it ~512 times tougher because that's the maximum amount of effort the attacker has to do.

That is quite a lot. If I get 100 corrupt messages, I notice that.

Adversary simply starts by decrypting from position 256 and realizes they're at the center of a sentence.

How would that work?? That implies the adversary is in possession of the pad (by theft or be reproducing it), otherwise he has no chance to know if he is in the middle of the sentence or not.

I'm not sure why the raw bytes would otherwise be exposed in the encrypted message. The assumption is you're always encrypting some sort of sensitive information, whether it's the ord-value of prepended padding revealing it's length, a header that tells how long padding is used etc. Since pad is truly random, the ciphertext is truly random too. It reveals no information to attacker. If the pad is not truly random and adversary can break it with super computers, XORing the bits with weak RNG doesn't add security. That's why you make the warning on line 571. With OTP, each pad byte is independent. That means even if raw bytes would be revealed from padding (say you used null-bytes in padding), they do not compromise secrecy of plaintext. This holds true even if we're talking about a stream cipher.

@kfogel Wants to protect the users of his script the best way possible. So in case their random data is not random enough, this procedure is supposed to mitigate that by disguising. It will not convert it into security-guaranteeing randomness, and it will not lure the user into a feeling of safety, as it clearly says random pad must be truly random. In case it is not perfect, it's better to help mitigate that than to do nothing.

In the case of known plaintext, if the padding around the plaintext is known too, the attacker is able to produce the input of the SHA256 digest function and derive the raw digest. They can then XOR the digest with the ciphertext to obtain the key that was used to encrypt the hash, and use it to encrypt the hash of encrypted message.

That is not correct. Even with the plain text and the padding around the plaintext known, the attacker does not know the MAC.

So again, don't use non-standard techniques. Use a keyed hash function.

A keyed sha256 is not standard enough?

If attacker can still use the original raw head fuzz for existential forgeries, (e.g. if the forged plaintext message can be of the same length than original), it doesn't help.

I don't understand what you want to say. Maybe, I'm just not having my brightest moment. The attacker cannot use the head fuzz for forgeries. If they could, they could also use the tail fuzz for the same thing. The thing is that it's symmetric, what you can do on the left, you can also do on the right. So this does not speak against Headfuzz.

maqp commented 7 years ago

@aaannndddyyy

At most [the software] can interface to [HWRNG].

Agreed. That's why it should make a choice on whether to seek HWRNG support or move to computationally secure algorithms. Bernstein adviced against using just urandom here.

How would that work??

Like I said, "if there's a crib that breaks any byte from the pad --" It's not supposed to happen but if it doesn't happen, prepended padding is useless.

That is quite a lot. If I get 100 corrupt messages, I notice that.

Sure, but that's not what strong crypto is about. You don't strive for unbreakable encryption and then leave the authentication to odds of 1:512.

So in case their random data is not random enough, this procedure is supposed to mitigate that by disguising.

I'll try to explain. We can have computational security with urandom and the pad will look random, but if seed value can shomehow be predicted, e.g. if user runs OneTime with Linux live distribution, the generated keystream can be regenerated. One-time pads use HWRNG so that there is never an algorithm, no internal state to brute force.

We can estimate the entropy of HWRNG generated pad with a number of tools: Ent, Dieharder etc. It might turn out that the HWRNG has strong bias towards ones (e.g. 60% of sampled bits are ones, 40% are zeroes). This causes an uneven distribution between bytes in pad making certain pads more likely than others, meaning given some ciphertext, some plaintexts are more likely than others.

Now, assuming the adversary would have to work on such ciphertext. They didn't know the HWRNG was crappy, but they know that it could be and that to protect against direct attack the pad was XORed with insecure RNG. Now, they will have some work to do.

To quote Wikipedia-- "Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1." (This means that the sequence won't repeat, but) "The algorithm in its native form is not cryptographically secure. The reason is that observing a sufficient number of iterations (624 in the case of MT19937, since this is the size of the state vector from which future iterations are produced) allows one to predict all future iterations."

This means that if for some reason the HWRNG's noise was bad, e.g. sampling started before HWRNG was turned on etc, the randomizing effect of math.random is trivial to remove.

Even if the weak HWRNG was used throughout the pad generation, nation states could eliminate all Mersenne Twister states for every single key, eventually cracking the HWRNG. This might sound computationally expensive, but it's much faster than breaking modern ciphers like Salsa20 or AES.

it will not lure the user into a feeling of safety, as it clearly says random pad must be truly random.

"Use our One-time method by using an application that defaults to cryptographically secure RNG (invalidating security claim unless user provides their own pads). But don't worry, regardless of what pads you end up using, we XOR them using cryptographically insecure, 20 year old PRNG even Python's own reference manual advices against".

I think

"Use our encryption tool that uses a strong, peer-reviewed stream cipher, or alternatively a one-time pad sampled from a HWRNG to protect messages"

sounds better.

In case it is not perfect, it's better to help mitigate that than to do nothing.

It most certainly isn't. It's better to gracefully exit than to endanger the user.

A keyed sha256 is not standard enough?

A keyed HMAC-SHA256 is enough as it's secure against length extension attacks. HMAC is a specific construction But to answer, MAC doesn't have to be built around SHA3 although its IMO, the easiest to do.

I don't understand what you want to say.

If you have

CT1 = pad1 XOR (padding + message + padding) MAC1 = pad2 XOR HMAC-SHA256(padding + message + padding)

and you guess the message and length of prepended padding

CT1 XOR (padding + message + padding) = pad1 MAC1 XOR HMAC-SHA256(padding + message + padding) = pad2

You now know both pads so you can forge a new message.

So what remains to be understood is, is this the case, is there a separate key like this:

CT1 = pad1 XOR (padding + message + padding) MAC1 = (pad2 XOR) HMAC-SHA256(key, padding + message + padding)

or preferably

MAC1 = (pad2 XOR) HMAC-SHA256(key, CT1)

aaannndddyyy commented 7 years ago

If you have

CT1 = pad1 XOR (padding + message + padding) MAC1 = pad2 XOR HMAC-SHA256(padding + message + padding)

and you guess the message and length of prepended padding

CT1 XOR (padding + message + padding) = pad1 MAC1 XOR HMAC-SHA256(padding + message + padding) = pad2

You now know both pads so you can forge a new message. So what remains to be understood is, is this the case, is there a separate key like this:

CT1 = pad1 XOR (padding + message + padding) MAC1 = (pad2 XOR) HMAC-SHA256(key, padding + message + padding)

or preferably

MAC1 = (pad2 XOR) HMAC-SHA256(key, CT1)

In this attack you are assuming that the adversary KNOWS the pad beforehand. In this case nothing can save you, except if you say, you use another pad to doubly encrypt it with, but then again, that pad could be leaked too. One of the major assumptions of all cryptography is that the secret keys remain secret. Under that assumption, it would be practically impossible to run a known plaintext attack on onetime as it is now. Because even if the attacker knows the exact message already and captures the ciphertext, (meaning that he could get the original pad data of the part of the pad that encrypts the message, thus being able to successfully encrypt a forged message) he can NOT get a valid mac for his forged message, as he does not know the the secret key the hash function was fed with, unless he breaks the hash function. He does not even know the the mac for the plaintext message that the original author sent, therefore he cannot acquire the pad data from that part either. So again, your scenario only works if the attacker already knows the pad.

aaannndddyyy commented 7 years ago

Agreed. That's why it should make a choice on whether to seek HWRNG support or move to computationally secure algorithms. Bernstein adviced against using just urandom here.

Right, but nowhere does onetime use /dev/urandom for encryption nor does it recommend /dev/urandom, it does recommend /dev/random though.

"Use our encryption tool that uses a strong, peer-reviewed stream cipher, or alternatively a one-time pad sampled from a HWRNG to protect messages"

sounds better.

Yes. It does. But currently this script is about encrypting and decrypting using one-time pads. It does not create one-time pads. If you want to have a go at implementing a secure way to do this, I'm sure it will be appreciated.

At most [the software] can interface to [HWRNG].

Agreed. That's why it should make a choice on whether to seek HWRNG support or move to computationally secure algorithms.

As I said, if you want to implement this, please do and give it to use.

A keyed HMAC-SHA256 is enough as it's secure against length extension attacks. HMAC is a specific construction But to answer, MAC doesn't have to be built around SHA3 although its IMO, the easiest to do.

So here you seem to agree that a keyed sha256 is enough for authenticating. And that's what onetime uses. You suggest using different pads. Well, if the pad is perfectly random, there is no point in using different pads. next bytes on the same pad could as well have come from a totally different pads and vice versa.

kfogel commented 7 years ago

I just want to reiterate that the authentication is not at 1:512 odds. That's not what that particular bit of (pseudo)randomness is about. I think you're both right that that "feature" is probably not adding much value anyway, and I will probably remove it, but in any case it has never been what's guaranteeing the authentication.

Best, -Karl

aaannndddyyy commented 7 years ago

I does add further authentication though, because it is actually a long message key which contributes to the keyed hash and is securely encrypted. That means it obfuscates message length and is a KPA countermeasure - but not in that it only obfuscates plaintext position, this is just a tiny aspect that does not add much, but also, provenly, does not make it less secure.

maqp commented 7 years ago

In this attack you are assuming that the adversary KNOWS the pad beforehand.

Allow me to rephrase:

Alice computes:

CT1  = pad1 XOR (padding + message + padding)
MAC1 = pad2 XOR HMAC-SHA256(padding + message + padding)

Eve who guesses the plaintext can compute

CT1 XOR (padding + message + padding) = pad1
MAC1 XOR HMAC-SHA256(padding + message + padding) = pad2

So no, you don't need to know the pad. If you intercept the packet and know the message, you get the pad from plaintext and ciphertext alone.

In this case nothing can save you

In case where pad is known, no. But in cases where PT is known, proper MAC calculated from ciphertext prevents making alterations. Unless MAC is keyed, its trivial to forge.


[attacker] can NOT get a valid mac for his forged message, as he does not know the the secret key the hash function was fed with, unless he breaks the hash function.

This has been the issue all along. The developer's questionable MAC design hasn't been clear on whether the hash is keyed to produce a MAC, or whether MAC is just hash of plaintext (and perhaps padding) that gets XORed with ciphertext (not KPA secure).


Right, but nowhere does onetime use /dev/urandom for encryption nor does it recommend /dev/urandom, it does recommend /dev/random though.

The output is essentially equally secure: it's based on preimage resistance of SHA-1. That's the weak link, not reusing bits in random pool.

If you want to have a go at implementing a secure way to do this, I'm sure it will be appreciated.

There are no options outside chaotic processes, i.e. HWRNGs. “Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” -Von Neumann.

So here you seem to agree that a keyed sha256 is enough for authenticating.

Yes, absolutely. As long as the key goes in the hash function instead of it being XORed with the hash itself.

You suggest using different pads.

What? By referring to pad1, pad2 etc? These are just identifiers for specific bytes that are used only once. Alice and Bob have identical pads that contain one time pre shared keys but as this is obvious we can refer to different parts of the pad with numbers.

That means it obfuscates message length and is a KPA countermeasure

Existential forgeries with proper HMAC-SHA256 are a non-issue but again, as it's only computationally secure, the proper solution when assuming OTP is needed is to use one-time / Carter-Wegman MACs.

All in all, @kfogel is right to reduce complexity of source in favour of security.

aaannndddyyy commented 7 years ago

Onetime uses a keyed-hash. It hashes the message AND a secret key, so judging from tour above comment, this issue is then solved. Re the randomness: what you use for generating your pads is not part of this script. It's only a matter of documentation. But on some devices there are hwrng and they are fed into /dev/random, so it's not bad in all situations to use it.

aaannndddyyy commented 7 years ago

Carter-Wegman is just one way of doing it