samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
640 stars 174 forks source link

Using ChaCha20-Poly1305 for encryption is NOT FIPS140-3 compliant and not justified and is considered unprotected plaintext #780

Open mmokrejs opened 1 month ago

mmokrejs commented 1 month ago

Hi, I wondered for a while why ChaCha20-Poly1305 was selected by GA4GH for data encryption and the closest I could find were notes on fixed blocksize giving possibility to jump into middle of a stream to decrypt the content (after indexing) and the TLS would need to re-enrypt the data. To some extent, concerns were with maximum file(stream) sizes transferrable during a single network connection.

The document http://samtools.github.io/hts-specs/crypt4gh.pdf is still quite sparse on this and I wonder if explicit explanation why AES-256-GCM was not selected. The data will be transferred via servers (Intel/AMD based) so the rumors that ChaCha20 is faster on mobile devices is out of question, IMO. From reading some docs on the internet actually the key to speed are not AES instruction but already availability of SSE, SSE2 and AVX instruction and registers provides most of the benefit to ciphers utilizing tuples with matching sizes [see section 4.1 in Gimli 20190927. Moreover, on my mobile phone is only marginally slower. On the same phone, AES-256-GCM is faster than ChaCha20 on the same device. What am I missing?

I think none of these really justify use of an untested/non-certified algorithm: https://csrc.nist.gov/pubs/fips/140-3/final

Anyway, the major argument is from auditing and compliance perspective. The algorithm is not accepted by FIPS-140, not even in FIPS-140-3.

Let me quote from the FIPS-140-3 Cryptographic Module Validation Program CMVP addendum: Use of Non-validated Cryptographic Modules by Federal Agencies and Departments

Non-validated cryptography is viewed by NIST as providing no protection to the information or data—in effect the data would be considered unprotected plaintext. If the agency specifies that the information or data be cryptographically protected, then FIPS 140-2 or FIPS 140-3 is applicable. In essence, if cryptography is required, then it must be validated. Should the cryptographic module be revoked, use of that module is no longer permitted.

If LocalEGA and FEGA and other tools including samtoos htsget are to be deployed world-wide then commonly accepted and certified crypto must be enabled in the default. Are we at all allowed to stored data using uncertified algorithm, which is officially recognized as no protection? I propose switching to AES.

Hundreds of sotware tools, Linux distros, etc., undergo validation. Obviously, tools exposing uncertified ChaCha20 are breaking eventual certification, see some examples and search for ChaCha20:

https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp4046.pdf

https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp4284.pdf

https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp3820.pdf

More can be found at https://www.nist.gov/search?s=ChaCha20&from=2&index=all-meta-engine&order=r&rpp=10

Same applies to BLAKE2, Salsa20 and successors. BLAKE2 was also selected by GA4GH. I am not certain on compliance of X25519, based on Curve25519.

I am not a crypto expert but seems in multi-user settings AES-256-CBC would be advantageous ober -GCM. And the submission hosts will encrypt data concurrently, right?

For ChaCha20 security comparison against AES-GCM, see https://dl.acm.org/doi/abs/10.1145/3460120.3484814 , and from https://dl.acm.org/action/downloadSupplement?doi=10.1145%2F3460120.3484814&file=CCS21-fp593.mp4 I quote two slides:

mpv-shot0003 mpv-shot0006

mmokrejs commented 1 month ago

On Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz I get:

$ openssl speed -evp chacha20
Doing chacha20 for 3s on 16 size blocks: 69709801 chacha20's in 3.00s
Doing chacha20 for 3s on 64 size blocks: 30841787 chacha20's in 3.00s
Doing chacha20 for 3s on 256 size blocks: 15912826 chacha20's in 3.00s
Doing chacha20 for 3s on 1024 size blocks: 8240465 chacha20's in 3.00s
Doing chacha20 for 3s on 8192 size blocks: 1074814 chacha20's in 3.00s
Doing chacha20 for 3s on 16384 size blocks: 539367 chacha20's in 3.00s
OpenSSL 1.1.1w  11 Sep 2023
built on: Fri Oct 13 21:05:46 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -pipe -march=native -ftree-vectorize -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20        371785.61k   657958.12k  1357894.49k  2812745.39k  2934958.76k  2945662.98k
$
$ openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 121275087 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 43609451 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 11103222 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 2781847 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 347985 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 174283 aes-256-cbc's in 3.00s
OpenSSL 1.1.1w  11 Sep 2023
built on: Fri Oct 13 21:05:46 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -pipe -march=native -ftree-vectorize -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc     648963.68k   933446.44k   947474.94k   949537.11k   950231.04k   951817.56k
$
$ openssl speed -evp aes-256-gcm
Doing aes-256-gcm for 3s on 16 size blocks: 84184815 aes-256-gcm's in 2.99s
Doing aes-256-gcm for 3s on 64 size blocks: 55949017 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 25053937 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 9488193 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 1409041 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 16384 size blocks: 714001 aes-256-gcm's in 3.00s
OpenSSL 1.1.1w  11 Sep 2023
built on: Fri Oct 13 21:05:46 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -pipe -march=native -ftree-vectorize -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm     450487.30k  1193579.03k  2137935.96k  3238636.54k  3847621.29k  3899397.46k
$
$ openssl speed -evp aes-128-gcm
Doing aes-128-gcm for 3s on 16 size blocks: 90338227 aes-128-gcm's in 2.99s
Doing aes-128-gcm for 3s on 64 size blocks: 58401996 aes-128-gcm's in 2.99s
Doing aes-128-gcm for 3s on 256 size blocks: 30035335 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 12146898 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 1931301 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 984404 aes-128-gcm's in 3.00s
OpenSSL 1.1.1w  11 Sep 2023
built on: Fri Oct 13 21:05:46 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -pipe -march=native -ftree-vectorize -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm     483415.26k  1250076.17k  2563015.25k  4146141.18k  5273739.26k  5376158.38k
$

On Qualcomm Snapdragon 888 2.84GHz (SM8350, Kryo 680) I get:

~ $ openssl speed -evp chacha20
Doing ChaCha20 ops for 3s on 16 size blocks: 47637197 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 64 size blocks: 19894025 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 256 size blocks: 9541792 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 1024 size blocks: 3261224 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 8192 size blocks: 411641 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 16384 size blocks: 202640 ChaCha20 ops in 2.96s
version: 3.2.1
built on: Tue Feb 20 04:44:24 2024 UTC
options: bn(64,64)
compiler: aarch64-linux-android-clang -fPIC -pthread -Wa,--noexecstack -Qunused-arguments -fstack-protector-strong -Oz -DNO_SYSLOG -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZLIB_SHARED -DNDEBUG  -I/data/data/com.termux/files/usr/include
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
ChaCha20        257498.36k   430141.08k   825236.06k  1128207.22k  1139244.28k  1121639.78k
~ $ openssl speed -evp chacha20
Doing ChaCha20 ops for 3s on 16 size blocks: 47928059 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 64 size blocks: 20319820 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 256 size blocks: 9590102 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 1024 size blocks: 3252023 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 8192 size blocks: 418673 ChaCha20 ops in 2.96s
Doing ChaCha20 ops for 3s on 16384 size blocks: 212663 ChaCha20 ops in 2.96s
version: 3.2.1
built on: Tue Feb 20 04:44:24 2024 UTC
options: bn(64,64)
compiler: aarch64-linux-android-clang -fPIC -pthread -Wa,--noexecstack -Qunused-arguments -fstack-protector-strong -Oz -DNO_SYSLOG -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZLIB_SHARED -DNDEBUG  -I/data/data/com.termux/files/usr/include
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
ChaCha20        259070.59k   439347.46k   829414.23k  1125024.17k  1158705.82k  1177118.44k
~ $ openssl speed -evp aes-256-gcm
Doing AES-256-GCM ops for 3s on 16 size blocks: 70888598 AES-256-GCM ops in 2.96s
Doing AES-256-GCM ops for 3s on 64 size blocks: 44086448 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 256 size blocks: 19245107 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 1024 size blocks: 6050901 AES-256-GCM ops in 2.96s
Doing AES-256-GCM ops for 3s on 8192 size blocks: 863418 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 16384 size blocks: 421934 AES-256-GCM ops in 2.95s
version: 3.2.1
built on: Tue Feb 20 04:44:24 2024 UTC
options: bn(64,64)
compiler: aarch64-linux-android-clang -fPIC -pthread -Wa,--noexecstack -Qunused-arguments -fstack-protector-strong -Oz -DNO_SYSLOG -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZLIB_SHARED -DNDEBUG  -I/data/data/com.termux/files/usr/include
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-GCM     383181.61k   956451.75k  1670083.86k  2093284.67k  2397667.88k  2343378.53k
~ $ openssl speed -evp aes-256-gcm
Doing AES-256-GCM ops for 3s on 16 size blocks: 68787102 AES-256-GCM ops in 2.96s
Doing AES-256-GCM ops for 3s on 64 size blocks: 43635690 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 256 size blocks: 18168661 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 1024 size blocks: 6089534 AES-256-GCM ops in 2.96s
Doing AES-256-GCM ops for 3s on 8192 size blocks: 818217 AES-256-GCM ops in 2.95s
Doing AES-256-GCM ops for 3s on 16384 size blocks: 424565 AES-256-GCM ops in 2.95s
version: 3.2.1
built on: Tue Feb 20 04:44:24 2024 UTC
options: bn(64,64)
compiler: aarch64-linux-android-clang -fPIC -pthread -Wa,--noexecstack -Qunused-arguments -fstack-protector-strong -Oz -DNO_SYSLOG -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZLIB_SHARED -DNDEBUG  -I/data/data/com.termux/files/usr/include
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-GCM     371822.17k   946672.60k  1576670.24k  2106649.60k  2272147.00k  2357990.83k
~ $

So, for 64 size blocks 20319820 ChaCha20 vs. 43635690 AES-256-GCM on a mobile phone shows AES is actually faster on the same device. And btw., the phone is not that much slower than a laptop with aes` in CPU, doing for 64 size blocks 30841787 chacha20's vs. 55949017 aes-256-gcm's. Anyway, nobody is going to use mobile phone to download genomic data and definitely not to use as a submission host.

mmokrejs commented 1 month ago

Useful overview: https://csrc.nist.gov/Projects/fips-140-3-transition-effort You will NOT find ChaCha20 nor BLAKE nor Salsa ciphers in Algorithm: dropdown ENUM listings at https://csrc.nist.gov/projects/cryptographic-module-validation-program/validated-modules/search under the Advanced tab.

Probably easiest is to look for approved algos from openssl of some large vendor:

https://csrc.nist.gov/projects/cryptographic-module-validation-program/validated-modules/search?SearchMode=Advanced&Vendor=Red+Hat&CertificateStatus=Active&ValidationYear=0

which leads to https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4746 (openssl) https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4434 (Linux kernel crypto)

and

https://csrc.nist.gov/projects/cryptographic-module-validation-program/validated-modules/search?SearchMode=Advanced&Vendor=Canonical&CertificateStatus=Active&ValidationYear=0

which leads e.g. to https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4292 (openssl) https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4366 (Linux kernel crypto)

or

https://csrc.nist.gov/projects/cryptographic-module-validation-program/validated-modules/search?SearchMode=Advanced&Vendor=Oracle&CertificateStatus=Active&ValidationYear=0

which leads e.g. to https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4739 (Linux kernel crypto)

I think more interesting read will be which of the Block Cipher Modes of Operation should be used for genomic data, provided every single personal genome assebly will start with chr1, continue with telomeric repeat, likewise if similar if not even exactly same organization/ordering of data in SAM/BAM/FASTQ will be more exploitable than others. And yeah, the TLS filesize limits and security implications enforced by different algo modes. I know, it is tough.

daviesrob commented 1 month ago

crypt4gh is a standard for file storage at rest. If you want to transmit files securely, you should use a protocol designed for that (i.e. TLS). Yes, this does mean the data may be encrypted twice if you transmit it.

ChaCha20-Poly1305 was chosen because it was already used in existing standards, has good library support and is relatively easy to use. AES-GCM mode was considered at the time, but mainly rejected due to the limitation on the amount of data that could be encrypted under a single key (around 64Gb, see also NIST Special Publication 800-38D section 8.3). As genomic data files can often be bigger than this, it would have introduced some complication around the need to encrypt large files using more than one key. There have been calls for an improved cipher that avoids this problem, but I would imagine that it will be a while before anything got approved.

AES-256-CBC does not provide authentication and is vulnerable to padding oracle attacks. While these problems can be worked around, it is easier to use an AEAD construct that does not have them, like AES-GCM or ChaCha20-Poly1305.

Unfortunately, as noted, ChaCha20-Poly1305 does not have FIPS approval, which does mean users who need FIPS compliance cannot currently use crypt4gh. The specification could be extended to support compliant encryption (it was designed to allow this sort of extension) if there is demand for such an upgrade. This is more likely to happen if anyone who would like to see the change is willing to help implement it.