Open wangzy0327 opened 8 months ago
Such info can be found in specification documents. https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_memory_model https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:memory.model https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-model
Hope that gets you started.
Thanks
@wangzy0327 hi, did Arvind's answer help you?
@wangzy0327 hi, did Arvind's answer help you? When I was extending the SYCL code, I encountered the following error. It looks like an address space mapping problem. Can you give me some suggestions? How to analyze or debug?
@KornevNikita
The error line is PI_CHECK_ERROR(cnQueueSync(s));
pi_cnrt.cpp
pi_result cnrt_piQueueRelease(pi_queue command_queue) {
assert(command_queue != nullptr);
if (command_queue->decrement_reference_count() > 0) {
return PI_SUCCESS;
}
try {
std::unique_ptr<_pi_queue> queueImpl(command_queue);
ScopedContext active(command_queue->get_context());
command_queue->for_each_queue([](CNqueue s) {
PI_CHECK_ERROR(cnQueueSync(s));
PI_CHECK_ERROR(cnDestroyQueue(s));
});
return PI_SUCCESS;
} catch (pi_result err) {
return err;
} catch (...) {
return PI_ERROR_OUT_OF_RESOURCES;
}
}
This is test program about the device extend. test_demo.cpp
#include <CL/sycl.hpp>
#include <iostream>
#include <vector>
#include <sys/time.h>
using namespace sycl;
constexpr int N = 256;
long long getTime() {
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec*1000000 + tv.tv_usec);
}
int main(){
sycl::queue q;
auto dev = q.get_device();
float *a = (float *)malloc(sizeof(float) * N);
float *b = (float *)malloc(sizeof(float) * N);
float *c = (float *)malloc(sizeof(float) * N);
float *c_host = (float *)malloc(sizeof(float) * N);
for(int i = 0;i < N;i++){
a[i] = 0.5f;b[i] = 0.5f;c[i] = 0.0f;c_host[i] = 1.0f;
}
range<1> arr_range(N);
sycl::buffer<float,1> bufferA((float*)a,arr_range);
sycl::buffer<float,1> bufferB((float*)b,arr_range);
sycl::buffer<float,1> bufferC((float*)c,arr_range);
auto startTime = getTime();
q.submit([&](handler &h){
sycl::accessor aA{bufferA,h,read_only};
sycl::accessor aB{bufferB,h,read_only};
sycl::accessor aC{bufferC,h,write_only};
sycl::accessor<float, 1, sycl::access::mode::read_write, sycl::access::target::local> localAccA(N,h);
sycl::accessor<float, 1, sycl::access::mode::read_write, sycl::access::target::local> localAccB(N,h);
h.parallel_for<>(1,[=](sycl::id<1> i){
for(int j = 0;j < N;j++){
localAccA[j] = aA[j];
localAccB[j] = aB[j];
aC[j] = localAccA[j] + localAccB[j];
}
});
});
sycl::host_accessor host_accC(bufferC,read_only);
std::cout << "Result: " << host_accC[0] << " .. " << host_accC[N - 1] << std::endl;
auto endTime = getTime();
std::cout << "Time : " << endTime - startTime <<" us "<< std::endl;
free(a);
free(b);
free(c);
free(c_host);
return 0;
Hi @wangzy0327
I tried to compile your code using 'clang++ -fsycl test.cpp'. Hope that is the right way. I ran into a few issues. When I looked closer at your code, I saw a few issues:
Thanks
I tried to compile the above sample code using the cuda version and extended hardware version of sycl released in 2022-06. The device-side llvm ir code compiled by sycl-cuda is as follows.
The device-side llvm ir code compiled by the extended hardware is as follows.
It is found that the handler of the extended hardware does not have the address 1 address number. How to fix this problem? How are the variable parameters of address 1 address defined and used? @KornevNikita @sommerlukas @elizabethandrews reference to source code (clang/lib/Basic/Targets/NVPTX.h)NVPTXAddrSpaceMap
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@asudarsa, could you please take one of the following actions:
Thanks!
How to develop the address space mapping for expanding new hardware? Can you give some specific suggestions and guidance? @asudarsa
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@asudarsa, could you please take one of the following actions:
Thanks!
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@asudarsa, could you please take one of the following actions:
Thanks!
Is your feature request related to a problem? Please describe
It is planned to expand new hardware based on SYCL. No relevant guidance has been found regarding the development of the address mapping part.Can you provide instructions or documents on address mapping for developers to refer to? This is the code part for the relevant address mapping based on 2022-06 version.What is the meaning of the contents of the NVPTXAddrSpaceMap variable? Which source files are involved in the relevant address space and the APIs called? @AlexeySachkov @elizabethandrews
Can you give me some help?
Describe the solution you would like
It is planned to expand new hardware based on SYCL about device memory access development.
Describe alternatives you have considered
No response
Additional context
No response