Closed mrakgr closed 8 years ago
Unfortunately your sample doesn’t run why I can’t test it (error in line 38). But I’m missing the PointerMode switch of your cublas context: Simplest way to do it is to give the CudaBlas constructor the PointerMode.Device as an argument in order to use device memory pointers for alpha and beta. I think that host pointers are default, but I’m not sure without checking. Could you check if your code misses the mode switch?
Thanks for the reply. I actually had no idea that the PointerMode switch even existed, as in the past I just let Alea take care of that in the background.
Ok, I tried replacing let cublas = CudaBlas()
with let cublas = CudaBlas(PointerMode.Device)
With this, it does not crash when I pass it alpha and beta into the call, but it still does not have the correct output. It still gives invalid value error when I try setting m to 2 and n to 4. To be honest I am surprised that you can't run the example. Do you maybe have an older version of VS? I am running on VS2015.
What is the error at line 38? Do you mean let q = (num_rows*num_cols) |> SizeT
?
Also, I tried to make it run with ManagedCuda 7.0 and I am having the same errors there.
I'm using VisualStudio 2013 and it gives me the error:
Error 1 Invalid use of a type name and/or object constructor. If necessary use 'new' and apply the constructor to its arguments, e.g. 'new Type(args)'. D:\...\Program.fs 45 41 ConsoleApplication1
for the line
let q = (num_rows*num_cols) |> SizeT
Nevertheless, as I'm not really familiar with F#, I copy/pasted it to C# and get the following:
//set to device pointer mode:
CudaBlas cublas = new CudaBlas(PointerMode.Device);
//another context set to host pointer mode:
CudaBlas cublashost = new CudaBlas(PointerMode.Host);
//In F# code this is the inverse?!?
Operation nT = Operation.NonTranspose;
Operation T = Operation.Transpose;
float[] t1 = new float[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
float[] t2 = new float[] { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 };
CudaDeviceVariable<float> d1 = t1;
CudaDeviceVariable<float> d2 = t2;
CudaDeviceVariable<float> d3 = new CudaDeviceVariable<float>(10);
int m = 10;
int n = 1;
CudaDeviceVariable<float> alpha = 2.0f;
CudaDeviceVariable<float> beta = 2.0f;
cublas.Geam(nT, nT, m, n, alpha, d1, m, d2, m, beta, d3, m);
float[] t3_1 = d3;
cublashost.Geam(nT, nT, m, n, 2.0f, d1, m, d2, m, 2.0f, d3, m);
float[] t3_2 = d3;
First thing I noticed: the variable nT
is set to Operation.Transpose
and T
is set to Operation.NonTranspose
? Transposing the matrix here would give a 1x10 matrix which makes sense to then only return the first element.
Second, if I change this and use a context either with host or device pointers, everything is running just fine.
Well, in opposite to Alea, managedCuda is meant to be as basic and with as little overhead as possible. Trying to introduce some comfort with some object oriented thinking (CudaDeviceVariable, etc.) does not hide all confusions of the original library...
Oh, I am totally retarded. I am so sorry to bother you over this. It was that I switched Transposed and NonTransposed. Thank you very much.
At first I thought I messed up something with context creation, but after I simply changed the call to geam from passing alpha and beta from host to the version that uses device pointers, it started throwing the AccessViolation exception. I have a minimal example here. I can't get geam (or gemm) to work at all. In the example provided it only adds up the first element of each matrix.