Closed SimonWang9610 closed 6 days ago
@SimonWang9610 Could you please provide a short reproducer?
@SimonWang9610 Could you please provide a short reproducer? Yeah, in my older solution, no matter how I create
dz
, it always returned0
. I tried to print (std::cout
) every member inLinear
, the result always showeddz = 0
, other members show they have different addresses. Like below (I putdz
,dw
as members ofLinear
):[0]: weight: 000002B9B9F40C00, bias: 000002B9B9F40C80, dz: 0000000000000000, dw: 000002B9B9F40D80 [1]: weight: 000002B9B9F40E00, bias: 000002B9B9F40F00, dz: 0000000000000000, dw: 000002B9B9F40F80 [2]: weight: 000002B9B9F41080, bias: 000002B9B9F41100, dz: 000002B9B9F41200, dw: 000002B9B9F41280
Today, I create a new solution with same codes in VS 2019. and I found it works (
dz
can be allocated correctly. The address it points to is not0
any more). However, I am still confused why the old solution failed. Besides, I also found the program still can access the elements ofv = malloc_device<T>(SIZE, Q)
afterfree(v, Q)
. It might because I have little understanding of USM in SYCL. If I will provide more details for you to fix this issue if I found some new things.
@SimonWang9610 Could you please provide a short reproducer?
I found why it always failed to allocate although I am not clear why it happened.
In my old solution, Linear
have a member `bool end. Like this:
template<typename T>
class Linear {
private:
T* weight;
T* input;
T* result;
T* bias;
T* dz;
const int M;
const int N;
const int K;
bool end;
public:
Linear(T* x, T* r, int m, int n, int k, bool e, queue& Q): input(x), result(r), M(m), N(n), K(k), end(e) {
weight = malloc_device<T>(M * N, Q);
bias = malloc_device<T>(M, Q);
dz = malloc_device<T>(N * K, Q);
}
...
When I delete end
from Linear
, all allocations work correctly. Besides, if I change bool end
as const int end
in Linear
, it also works well.
Hope it can help you a lot.
@SimonWang9610 Thanks, but could you please provide a small self-contained .cpp file which shows the problem so we can compile and run it on our side?
@romanovvlad my code
I have no ideas how to put my files here, so I post the above link.
All codes in the above link can pass compilation and run as I expected on Ubuntu 18.04, Nvidia GPU
. (I failed to configure CPU on Ubuntu
, so currently I only can use GPU).
And I am sure the reason why it allocates wrongly is because I set bool end
and use it in SYCL kernel functions. If I change it as int
or const int
, no such problems in my codes.
Besides, I got another problem on Windows. I run the codes in the above link on Windows Intel CPU
, but the dz
, dw
and diff
pointers (allocated by malloc_device
) always have undefined behaviour
(point to the numbers not as I expected).
I mean, the total same codes (in the above link) gave me different results while running on Ubuntu 18.04 Nvidia GPU
(works well) and Windows Intel CPU based on Intel base toolkit
(works wrongly)
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.
Hi Team, I am facing the same error, the device I am testing on is ARC A770. I have attached a test.zip which contains a test.cpp file to replicate the issue. Could you please guide on how to fix the issue, as there is no exception raised by the sycl::malloc_device? test.zip
Thank you, Preethi Raksha
Response received via https://github.com/intel/llvm/issues/15691 , Unable to close the issue.
@SimonWang9610, the allocation function you're using returns a nullptr
if there are not enough resources to allocate the requested memory. If you're using the L0 adapter, you can use this SYCL extension to query the available memory. Alternatively, you can try reducing the size of your allocations. Could you please verify if this solves your issue?
I could not reproduce the original issue reported here and therefore, I will close this tracker for now. If any of the issues are still there, please open a new tracker and we will take a look
x = malloc_device<T>(N * K, Q);
,r = malloc_device<T>(M * K, Q);
In my codes, when I have multipleLinear
instances sequently, all of them can allocate successful forweight
andbias
. However, only the lastLinear
instance can allocate successfully fordz
, others failed to allocate fordz
and return0
. (dz == nullptr
istrue
). I usedz
for storing temporary result in eachLinear
.Furthermore, if I change my code and put
dz
int member function ofLinear
, like bellow:I call
update
in afor
loop:if I change the above code as:
then, call it as:
all circumstances only can allocate successfully
dz
for the lastLinear
, others will get0
(nullptr
);This problem has driven me crazy! Hope guys can explain why it happened! Thanks! I tested those codes in
Windows 10 on Intel CPU, using oneAPI toolkit
andUbuntu 18.04 on Nvidia GPU
. They gave me same errors.