intel / qpl

Intel® Query Processing Library (Intel® QPL)
https://intel.github.io/qpl/
MIT License
98 stars 19 forks source link

Question about qpl_status return code #15

Closed marin-ma closed 1 year ago

marin-ma commented 1 year ago

I'm trying to enable IAA using QPL API to accelerate compression/decompression workloads. When calling qpl_execute_job several times, most of the job are successfully completed, but a few submission are returned with error. The return code sometimes could be 431, and sometimes 303. Both are not listed in https://intel.github.io/qpl/documentation/dev_ref_docs/c_ref/c_status_codes.html

What's the meaning of these return codes? How to avoid getting these return codes and make all jobs successfully completed?

Operating system info Linux 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2

OS name Ubuntu 22.04

Kernel version

Linux 5.19.0-32-generic

accel-config library version 3.5.3.git343d0a9d

CPU model Intel(R) Xeon(R) Platinum 8457C

Intel QPL version 1.1.0

User-specified CMake options and parameters -DCMAKE_BUILD_TYPE=Debug -DQPL_BUILD_TESTS=OFF -DLOG_HW_INIT=ON

Execution path qpl_path_hardware

Execution type (asynchronous or synchronous, threading, numa) synchronous

API used, incl. function name and a list of input parameters

    job->op = qpl_op_compress;
    job->next_in_ptr = source;
    job->next_out_ptr = dest;
    job->available_in = source_size;
    job->available_out = dest_size;
    job->level = qpl_default_level;
    job->flags = QPL_FLAG_FIRST | QPL_FLAG_DYNAMIC_HUFFMAN | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;

    status = qpl_execute_job(job);
mcao59 commented 1 year ago

Hi @marin-ma, the QPL status codes does not currently cover all IAA status codes. Status code 303 and 431 correspond to the IAA Operation Status Code 0x03, which means partial completion due to page fault, when the Block on Fault flag in the descriptor is 0. 303 means the faulting access is a read, and 431 means the faulting access is a write. Did you set BoF off with accel-config? If so, is there any specific reason that you need to set BoF off, like for performance reasons? QPL does not handle unresolved page fault, so users will see error directly if page fault happens and IAA does not block to resolve it. See here for more info about BoF with QPL and how to set it with accel-config: (see Note) https://intel.github.io/qpl/documentation/get_started_docs/installation.html?highlight=block%20fault#accelerator-configuration

We will add a QPL status code in c_status_codes.html to map this IAA page fault error We are also planning to review the BoF setting and possibly add logic to handle unresolved page fault in QPL by touching the page and resubmitting the job so that users will not see this error unless there's severe memory pressure. But we are not working on this right now because of other priorities. If you like, I can give you the jira number to track this.

marin-ma commented 1 year ago

@mcao59 Thanks for the explanation! I loaded configurations from here https://github.com/intel/qpl/blob/v1.1.0/tools/configs/2n4d8e1w-s.conf, and the BoF was indeed set to 0. After changing to 1 the error has gone.

mcao59 commented 1 year ago

@marin-ma Glad it works for you now! We also added page fault error to QPL status code in this commit so that in the future users will see QPL_STS_INTL_PAGE_FAULT or QPL_STS_INTL_W_PAGE_FAULT instead of 303 or 431 Could you please close this issue if you don't have other questions?