intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
391 stars 127 forks source link

[Gen4] qat_instance_handles potential memory violation under large number of instances in multiple-threaded case #318

Open Kewei-Lu opened 1 month ago

Kewei-Lu commented 1 month ago

In a system with 128 QAT VFs, each with 2 CyInstances and LimitDevAccess set to 0. Totally 512 instances shall be generated either sym or asym. A Segmentation fault happens when using openssl engine -c -t -v qatengine to validate.

Steps to reproduce:

# install qat OOT driver
$ ./configure --enable-icp-debug --enable-icp-trace --enable-icp-sriov=host
$ make -j && make install

$ cat /etc/4xxx_dev0.conf

[GENERAL]
ServicesEnabled = asym;sym
...
[SSL]
NumberCyInstances = 2
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 1

# Crypto - User instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 2

$ cat /etc/4xxxvf_dev0.conf

[GENERAL]
ServicesEnabled = asym;sym

[SHIM]
NumberCyInstances = 2
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0

# Crypto - User instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 1

$ systemctl restart qat

# install qat Engine
$ ./autogen.sh && ./configure --with-qat_hw_dir=path_to_qat_driver && make -j && make install

$ openssl engine -c -t -v qatengine

Segmentation fault (core dumped)

After debugging it, the root cause might be the predefined QAT_MAX_CRYPTO_INSTANCES to 256. In our case, it should be 512 to accommodate all instances.

After adding some log in qat_hw_init.c

    for (instNum = 0; instNum < qat_num_instances; instNum++) {
        /* Retrieve CpaInstanceInfo2 structure for that instance */
          printf("addr ptr of pInstanceInfo2: %ld \n", (unsigned long)&qat_instance_details[instNum].qat_instance_info);
          printf("sizeof CpaInstanceInfo2: %ld \n", sizeof(CpaInstanceInfo2));
          printf("addr ptr of qat_instance_handles: %ld\n",(unsigned long)&qat_instance_handles);

         status = cpaCyInstanceGetInfo2(qat_instance_handles[instNum],
                                        &qat_instance_details[instNum].qat_instance_info);
before cpaCySetAddressTranslation; instNum: 255; qat_instance_handle: 0x5555562be9d0
cpaCySetAddressTranslation() - : Called with params (0x5555562be9d0, 0x7ffff60112a0)

cpaCyStartInstance() - : Called with params (0x5555562be9d0)

cpaCyInstanceGetInfo2() - : Called with params (0x5555562be9d0, 0x7fffffffd5e0)

addr ptr of pInstanceInfo2: 140737323112320
sizeof CpaInstanceInfo2: 932
addr ptr of qat_instance_handles: 140737323112376
cpaCyInstanceGetInfo2() - : Called with params (0x5555562d1d10, 0x7ffff6269780)

Hardware watchpoint 1: qat_instance_handles

Old value = (CpaInstanceHandle *) 0x555556d60150
New value = (CpaInstanceHandle *) 0x0
0x00007ffff64d0163 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6
(gdb) where
#0  0x00007ffff64d0163 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6
#1  0x00007ffff5c8c43a in osalMemSet (ptr=0x7ffff6269780 <qat_instance_mutex>, filler=0 '\000', count=932) at /root/QAT20/quickassist/utilities/osal/src/linux/user_space/OsalServices.c:285
#2  0x00007ffff5c78a59 in cpaCyInstanceGetInfo2 (instanceHandle_in=0x5555562d1d10, pInstanceInfo2=0x7ffff6269780 <qat_instance_mutex>) at /root/QAT20/quickassist/lookaside/access_layer/src/common/ctrl/sal_crypto.c:3064
#3  0x00007ffff6011d50 in qat_hw_init (e=e@entry=0x55555582f2b0) at qat_hw_init.c:642
#4  0x00007ffff600eff0 in qat_engine_init (e=0x55555582f2b0) at e_qat.c:607
#5  0x00007ffff75564fd in engine_unlocked_init () from /lib64/libcrypto.so.1.1
#6  0x00007ffff7556658 in ENGINE_init () from /lib64/libcrypto.so.1.1
#7  0x000055555559dd29 in engine_main ()
#8  0x00005555555a3244 in do_cmd ()
#9  0x000055555558bf59 in main ()

When instNum becomes 256, addr of pInstanceInfo2 is 140737323112320 and it will memset 932 bytes, whose addr will be overlapped with that of qat_instance_handles

After changing QAT_MAX_CRYPTO_INSTANCES to 512, the error disappears

// e_qat.h

269 # define QAT_MAX_CRYPTO_INSTANCES 512 <- 256 at default
venkatesh6911 commented 2 weeks ago

Thanks @Kewei-Lu for reporting the issue. We will look into this.