Closed riveridea closed 2 years ago
This could be the DPDK PMD drier issue. I am looking into it.
Perhaps there is some issue with initialization
I will close this issue. It seems to be device management issue in DPDK, the mb_mgr address is contaminated.
close it.
I think my issue is related to this thread: https://review.spdk.io/gerrit/c/spdk/dpdk/+/1056
AES MB PMD is created successfully on one of my process, but I found the mb_mgr address is surprisingly contaminated after vdev_action runs on that process.
I am currently using DPDK 20.11, and i am wondering if there is bug in the rte_aesni_mb_pmd.c In cryptodev_aesni_mb_create(), when allocating the mb_mgr, it does not check if this is primary process or secondary process. So if the secondary process lauch the scan/probe and create the pmd, the internals->mb_mgr could be updated by a new allocated mb_mgr. I changed the code to allocate he mb_mgr only on primary process. The crash is gone.
I believe this issue should have been resolve in later DPDK as the mb_mgr is no longer in the dev->data->dev_private.
If I remember correctly primary/secondary DPDK model is intended for failover scenarios only. For multi-core processing crypto scheduler PMD should be used. @pablodelara can you point to some example code covering failover?
No matter what the primary/secondary processes are used for, the thing is I only explicitly init the aesni_mb device on my primary process. The secondary process implicitly triggers the device probe by the scan/probe procedure. And at the same time, the aesni_mb pmd in DPDK 20.11 has the bug I mentioned above. Then the memory crashes on primary process once the packet process tries to access that mb_mgr.
Update the title and close it as this is DPDK issue.
https://github.com/intel/intel-ipsec-mb/blob/a6008dbddf15a397ecc2cafb7f0a00d69ebf8037/lib/avx512/mb_mgr_avx512.c#L1870
I find my code always crashes on accessing the sha1_one_block whose value is assigned here. As sha1_one_block_avx512 is the function name, I am wondering we should change this line to state->sha1_one_block = &sha1_one_block_avx512;
I am verifying if this is an issue. But what confuses me is the testapp does not have any crashing issue. Not sure it is platform dependent.