intel / DML

Intel® Data Mover Library (Intel® DML)
https://intel.github.io/DML/
MIT License
85 stars 17 forks source link

Unknown buffer size limitation for CRC operation #36

Open bartlomiejgrzeskowiak opened 1 year ago

bartlomiejgrzeskowiak commented 1 year ago

What is the acceptable input buffer size for CRC operation ?

I play with different sizes of CRC buffer. DML Lib does accept different sizes, but it behaves with error or even segmentation fault in some cases.

Example execution:

[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_1KB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 1KB.
Calculated CRC is: 0x2cdf6e8f
Finished successfully.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 4MB.
An error (15) occured during job execution.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_16MB hardware_path
Segmentation fault (core dumped)
[bgrzesko@fl31ca105bs0411 build]$ 

How to reproduce (diff -> apply and compile example) :

[bgrzesko@fl31ca105bs0411 build]$ git diff
diff --git a/examples/low-level-api/crc_example.c b/examples/low-level-api/crc_example.c
index 3c12df2..ad03704 100644
--- a/examples/low-level-api/crc_example.c
+++ b/examples/low-level-api/crc_example.c
@@ -9,7 +9,8 @@
 #include "dml/dml.h"
 #include "examples_utils.h"

-#define BUFFER_SIZE 1024 // 1 KB
+//#define BUFFER_SIZE 4 * 1024 * 1024 // 4 MB
+#define BUFFER_SIZE 16 * 1024 * 1024 // 16 MB

 /*
 * This example demonstrates how to create and run a crc operation.
mzhukova commented 12 months ago

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

bartlomiejgrzeskowiak commented 12 months ago

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set. https://github.com/intel/DML/blob/6d71051c405c2318d06aad96d3b0244ce8c4bcbe/examples/low-level-api/crc_example.c#L20

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
mzhukova commented 12 months ago

hi @bartlomiejgrzeskowiak, sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

bartlomiejgrzeskowiak commented 12 months ago

hi @bartlomiejgrzeskowiak, sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

Hi @mzhukova ,

You're totally right. I was executing HW PATH. Sorry for misleading you, it was some time ago and I did not noticed that the argument does overwrite the path.

Please let me know if you're able to reproduce the issue.,

BR Bartek

abdelrahim-hentabli commented 12 months ago

Hey @bartlomiejgrzeskowiak , 16 MB is too large to be allocated on the stack. If you wanted to use a 16MB example, you would need to use malloc()

Simple godbolt example for large allocation: https://godbolt.org/z/Ts9xndqcq Quick reference I found for size of stack on linux being somewhere between 8-10MB: https://unix.stackexchange.com/questions/473416/why-on-modern-linux-the-default-stack-size-is-so-huge-8mb-even-10-on-some-di

abdelrahim-hentabli commented 12 months ago

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set. https://github.com/intel/DML/blob/6d71051c405c2318d06aad96d3b0244ce8c4bcbe/examples/low-level-api/crc_example.c#L20

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,

Hey @bartlomiejgrzeskowiak , it seems that your workqueue's max_transfer_size is 2 MB (2097152 bytes), which would explain the 4 MB example issue

bartlomiejgrzeskowiak commented 11 months ago

Hi @abdelrahim-hentabli ,

Ok, but:

  1. Max_transfer_size can be configured by system admin, so I might not know it by heart. How can I get this value in my code ? Which API function does return max_transfer_size ?
  2. What about 16MB ? The lib or example should never crash I suppose ?
abdelrahim-hentabli commented 11 months ago

Hey @bartlomiejgrzeskowiak

  1. Currently DML does not have an API to get the max_transfer_size. You would need to use libaccel-config's API to get these values accfg_wq_get_max_transfer_size()
  2. Please see my comment from above: https://github.com/intel/DML/issues/36#issuecomment-1758324846