intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.26k stars 740 forks source link

[opencl plugins] If no out of order queue support, then temporary queues get created that never get deleted. #11156

Open coldav opened 1 year ago

coldav commented 1 year ago

Describe the bug The OpenCL plugin support for dpc++ tries to create a queue with the property CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE. If this fails (such as in the oneAPI Construction Kit), then it may create temporary queues to enforce dependencies. It never deletes these temporary queues.

To Reproduce

In detail:

Use the following script, it assumes that we already have LD_LIBRARY_PATH and PATH set up correctly for dpc++

Please describe the steps to reproduce the behavior:

git clone git@github.com:codeplaysoftware/oneapi-construction-kit.git
cd oneapi-construction-kit
git checkout 142c2d4964e44480871bd68ba8d454e6fc1051e2
export ONEAPI_CON_KIT_INSTALL_DIR=$PWD/build/install

cmake -Bbuild -DCA_LLVM_INSTALL_DIR=$LLVM_INSTALL_DIR -DCA_ENABLE_API=cl -DCMAKE_INSTALL_PREFIX=$ONEAPI_CON_KIT_INSTALL_DIR $PWD
cd build
make -j8 install
cd ../..

git clone git@github.com:oneapi-src/oneAPI-samples.git 
cd oneAPI-samples/DirectProgramming/C++SYCL/DenseLinearAlgebra/vector-add
export OCL_ICD_FILENAMES=$ONEAPI_CON_KIT_INSTALL_DIR/lib/libCL.so
export ONEAPI_DEVICE_SELECTOR="*:cpu"
export SYCL_CONFIG_FILE_NAME=null.cfg
clang++ -fsycl src/vector-add-buffers.cpp -o vector-add-buffers
SYCL_PI_TRACE=-1 ./vector-add-buffers 2>&1 | tee /tmp/run.txt
grep Queue /tmp/run.txt

This shows four piextQueueCreate and one piQueueRelease

Environment (please complete the following information):

Additional context Add any other context about the problem here.

coldav commented 1 year ago

This is a major issue for us, is there any idea when it might be addressed?

bader commented 1 year ago

@coldav, do you observe this issue with other back-ends (e.g. Level Zero)? @kbenzie, could you take a look, please?

coldav commented 1 year ago

I don't have level zero as a route to testing the oneapi construction kit, and I'm not sure how to repeat this any other way.

bader commented 1 year ago

Potentially this might be a memory leak in the runtime as well. Tagging @bso-intel and @intel/llvm-reviewers-runtime for awareness.

coldav commented 1 year ago

To be clear, I haven't been able to show this since the UR change over, since it didn't have the out of order aspect written, but the thought seemed to be that the failure was elsewhere. (If we do claim we support out of order, then I don't believe the temporary queues are created).

kbenzie commented 1 year ago

@kbenzie, could you take a look, please?

I've been chatting with @coldav about this internally, we don't think there's much to go wrong in the OpenCL adapter around this. Seems more likely to me its an interaction in the SYCL RT causing the issue.

coldav commented 1 year ago

I can confirm I still see this even with the yet to be merged https://github.com/oneapi-src/unified-runtime/pull/975/files (which means that a lot of tests pass).

coldav commented 11 months ago

I'm guessing this won't be looked at before the final oneapi release?

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.