intel / DML

Intel® Data Mover Library (Intel® DML)
https://intel.github.io/DML/
MIT License
81 stars 17 forks source link

An issue about Multi-Socket sample code #33

Open Sean58238 opened 9 months ago

Sean58238 commented 9 months ago

Issue descript: Multi-Socket sample code default set socket number = 4 , running on different config/SKU has some different results #define SOCKET_COUNT 4u

Config 1: CPU: Intel(R) Xeon(R) Platinum 8490H Socket : 2 DSA device per Socket: 4 Enable 1 device: dsa0 (on socket0)

Both SOCKET_COUNT equal to 1~4 can running successful.

Config 2: CPU: Intel(R) Xeon(R) Platinum 8470 Socket : 2 DSA device per Socket: 1

setup1: // error failed to submit to node0 Enable 1 device: dsa0 (on socket0) SOCKET_COUNT=4

setup2: // error failed to submit to node1 Enable 1 device: dsa0 (on socket0) Enable 1 device: dsa1 (on socket1) SOCKET_COUNT=4

setup3: // successful Enable 1 device: dsa0 (on socket0) Enable 1 device: dsa1 (on socket1) SOCKET_COUNT=2

setup4: // successful Enable 1 device: dsa0 (on socket0) SOCKET_COUNT=4 Commented out code: current_job->numa_id = i

 for (uint32_t i = 0; i < SOCKET_COUNT; ++i)
    {
        const uint32_t chunk_size = transfer_size / SOCKET_COUNT;

        dml_job_t* current_job = (dml_job_t*)((uint8_t*)jobs + (job_size * i));

        current_job->operation             = DML_OP_MEM_MOVE;
        current_job->source_first_ptr      = src + (chunk_size * i);
        current_job->destination_first_ptr = dst + (chunk_size * i);
        current_job->source_length         = chunk_size;
        current_job->flags                 = DML_FLAG_PREFETCH_CACHE;
        //current_job->numa_id               = i;
    }

Why has these different results , does any logic issue about numa node check of DML?

mzhukova commented 9 months ago

Hi @Sean58238, could you please clarify, have you commented out this portion of the code for all of setup1-4 runs with Config 2?

mzhukova commented 9 months ago

While we're waiting for the clarification, I'll try to provide some guidance, maybe this would help to resolve misunderstanding.

Here is the documentation chapter w.r.t NUMA support in DML. It says:

The library is NUMA aware and respects the NUMA node id of the calling thread. If a user needs to use a device from a specific node, it can be done in two ways:

  • Pin thread which performs submissions to the specific NUMA, the library will use devices only from this node.
  • Set NUMA id parameter of the job to the specific node id, then devices will be selected only from this node.

In the example, by default, SOCKET_COUNT is set to 4, and then a job assigned to every NUMA node via setting current_job->numa_id = i (where i is from 0 to SOCKET_COUNT). Therefore, the example should be expected to succeed if (a) 4 sockets are available on the system (otherwise you need to adjust SOCKET_COUNT) and (b) on each socket, there is a DSA instance present. This would be in accordance with "Set NUMA id parameter of the job to the specific node id, then devices will be selected only from this node".

If current_job->numa_id is not set, DML would use the information about the NUMA node of the calling thread for the job submission (meaning, your thread/process is tied to CPU core from NUMA node N and only devices from N would be used during execution). You could pin the process to a specific node via numactl --cpunodebind=<node_id> for instance.

Hope this helps, let me know.

mzhukova commented 9 months ago

hi @Sean58238 does this clarifies things for you?

Sean58238 commented 9 months ago

yes, setup1-4 runs with Config 2, this issue happened on this config