Open lanzithinking opened 4 years ago
@lanzithinking there is no specific 'plan', so feel free to contribute test cases or even codes! If it's not urgent, I can take a look at it in my free time after you give me some test cases (e.g., 3x3 matrix and expected result). It would probably take weeks to code.. Otherwise, you can checkout other working magma functions (e.g., geev) to see how the wrapper works and code it up. Let me know if you have any questions. I am glad to help in either way.
Dear Kit Lee,
Thanks for your reply! I worked a little bit as you suggested and have attached my codes to the email:
I am working on NVIDIA TESLA V100, Ubuntu 18.04.
Put those three files in the same folder and run test5. I got the following error:
Traceback (most recent call last):
File "test5.py", line 30, in
It would be nice if you could help me to see what went wrong. Thank you so much!
Best regards,
Shiwei Lan
On Jan 15, 2020, at 9:56 AM, Kit Lee notifications@github.com wrote:
@lanzithinking https://github.com/lanzithinking there is no specific 'plan', so feel free to contribute test cases or even codes! If it's not urgent, I can take a look at it in my free time after you give me some test cases (e.g., 3x3 matrix and expected result). It would probably take weeks to code.. Otherwise, you can checkout other working magma functions (e.g., geev) to see how the wrapper works and code it up. Let me know if you have any questions. I am glad to help in either way.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lebedov/scikit-cuda/issues/289?email_source=notifications&email_token=AAXGXDAMZGQ5VFJZDAXUCGTQ5454RA5CNFSM4KHBFZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBAUFY#issuecomment-574753303, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGXDFFYAK4JOLXWOUMNXTQ5454RANCNFSM4KHBFZPQ.
Hi @lanzithinking, thanks for the attempt! A few things to note:
ssyevdx_m
is a "range" type. In line 117 of the original magma.py:
_libmagma.magma_range_const.argtypes = [ctypes.c_char]
which means the input (in python) should be a single character.Hope this help! PS. I just found I didn't install scikit-cuda on my current machine.. so for now I may not able to test anything..
Hi @wingkitlee0 , thanks for the quick response. I insert 'eigs' function I modified from your gpu-linalg.py here to see if you can spot any error (I set rnge to be 'I', which is a single character):
` import numpy as np import magma
typedict = {'s': np.float32, 'd': np.float64, 'c': np.complex64, 'z': np.complex128} typedict_= {v: k for k, v in typedict.items()}
def eigs(a_gpu, k=None, which='LM', imag=False, return_eigenvectors=True):
if len(a_gpu.shape) != 2:
raise ValueError("M needs to be a rank 2 square array for eig.")
magma.magma_init()
ngpu=1
dtype = a_gpu.dtype.type
t = typedict_[dtype]
N = a_gpu.shape[1]
if k is None: k=int(np.ceil(N/2))
if 'L' in which:
a_gpu = -a_gpu
if return_eigenvectors:
jobz = 'V'
else:
jobz = 'N'
rnge = 'I'; uplo = 'U'
vl = 0; vu = 1e10
il = 1; iu = k; m = 0
w_gpu = np.empty((N,), dtype, order='F') # eigenvalues
if t == 's':
nb = magma.magma_get_ssytrd_nb(N)
elif t == 'd':
nb = magma.magma_get_dsytrd_nb(N)
else:
raise ValueError('unsupported type')
lwork = N*(1 + 2*nb)
if jobz:
lwork = max(lwork, 1+6*N+2*N**2)
work = np.empty(lwork, dtype)
liwork = 3+5*N if jobz else 1
iwork = np.empty(liwork, dtype)
if t == 's':
status = magma.magma_ssyevdx_m(ngpu, jobz, rnge, uplo, N, a_gpu.ctypes.data, N,
vl, vu, il, iu, m,
w_gpu.ctypes.data, work.ctypes.data, lwork, iwork.ctypes.data, liwork)
elif t == 'd':
status = magma.magma_dsyevdx_m(ngpu, jobz, rnge, uplo, N, a_gpu.ctypes.data, N,
vl, vu, il, iu, m,
w_gpu.ctypes.data, work.ctypes.data, lwork, iwork.ctypes.data, liwork)
else:
raise ValueError('unsupported type')
if 'L' in which:
w_gpu = -w_gpu
magma.magma_finalize()
if jobz:
return w_gpu, a_gpu
else:
return w_gpu
`
Did you call the following in your ssyevdx_m
function?
rnge = _range_conversion[rnge]
Hi @lanzithinking , I think I figured out about the TypeError complained by ctypes
. In magma.py, all of range
, jobz
, uplo
need to have ctypes argument of c_int_type
. You may try the following code.
Can you come up with some test cases (numerical values)? I can add the codes and make a pull request next week.
# SSYEVDX_M, DSYEVDX_M, CHEEVDX_M, ZHEEVDX_M
_libmagma.magma_ssyevdx_m.restype = int
_libmagma.magma_ssyevdx_m.argtypes = [c_int_type, # ngpu
c_int_type, # jobz
c_int_type, # rnge
c_int_type, # uplo
c_int_type, # n
ctypes.c_void_p, # A
c_int_type, # lda
ctypes.c_float, # vl
ctypes.c_float, # vu
c_int_type, # il
c_int_type, # iu
ctypes.c_void_p, # m
ctypes.c_void_p, # w
ctypes.c_void_p, # work
c_int_type, # lwork
ctypes.c_void_p, # iwork
c_int_type, # liwork
ctypes.c_void_p] # info
def magma_ssyevdx_m(ngpu, jobz, rnge, uplo, n, A, lda,
vl, vu, il, iu, m,
w, work, lwork, iwork, liwork):
"""
Compute eigenvalues of real symmetric matrix.
Multi-GPU, data on host
"""
# _XXX_conversion[] returns integer according to magma_types.h
jobz = _vec_conversion[jobz]
rnge = _range_conversion[rnge]
uplo = _uplo_conversion[uplo]
info = c_int_type()
status = _libmagma.magma_ssyevdx_m(ngpu, jobz, rnge, uplo, n, int(A), lda,
vl, vu, il, iu, int(m),
int(w), int(work), lwork, int(iwork), liwork,
ctypes.byref(info))
magmaCheckStatus(status)
Hi @wingkitlee0, thank you so much! I was actually writing the comment while getting your message! Yes, there are two lines missing from the magma.py in the master branch: ` jobz = _vec_conversion[jobz]
rnge = _range_conversion[rnge] ` Similarly this is the case for many other functions involving 'jobz' and 'rnge'. Now it is working without bug. I am working to verify with Scipy. Thanks again for your quick action and great help!
The testing result is very disappointing... 4 functions: Neig - NumPy.linalg.eig; Seigs - Scipy.sparse.linalg.eigs; Keig - skcuda.linalg.eig; Xeig - coded function using magma_ssyevdx_m. N: size of PD matrix; k: the first k eigen-pairs for 'eigs'
The output eigenvalues and eigenvectors are cross-checked among 4 functions. However, the speed is shown below:
N=10, k=4: Neig: 0.00030 Seigs: 0.00075 Keig: 0.00093 Xeigs: 0.00047
N=100, k=50: Neig: 0.01453 Seigs: 0.01896 Keig: 0.00653 Xeigs: 0.01343
N=1000, k=100: Neig: 1.28382 Seigs: 0.53301 Keig: 0.34703 Xeigs: Segmentation fault (core dumped)
I found: (1) Xeigs can output all N eigen-pairs -- this is not partial eigensolver! (2) Xeigs generally fails (Segmentation fault (core dumped)) for matrices of size above 150 -- why?
First, don't get discouraged...
The result you are showing is for speed. I would first make sure they all give the same and correct answer. You may use np.allclose to do the comparison
I believe there are also I realize that ssyev does not have the non-d version.. On the other hand, the magma_ssyevdx_gpu
for single gpu and the non-x versions. Maybe you need to test all of them to see which one is the best_gpu
version requires some pycuda
stuff and it's generally not well-tested here..
Segmentation fault with large N sounds like the gpu memory is not enough. Close other programs? It seems that in Line 281 of ssyedx_m.cpp
falls back to Lapack on CPU if the matrix size is small. This means that we haven't been able to make this function work on GPU yet!
May I ask what does partial eigensolver do? just the first few eigenvalues according to its magnitude? if it does things iteratively, I think it's still possible to compute all of them but slower...
Thanks for your patience.
4 functions output eigenvalues/eigenvectors equal upto 5~6 digits -- this is understandable given their different realizations.
I tried 'magma_ssyevdx' and got the similar results. I did not try 'magma_ssyevdx_gpu' since _gpu version takes in PyCUDA gpuarray data, rather than ndarray data. This potentially could be faster since it avoids repeated host-GPU data transfer?
Thanks for digging into the cpp file and that helps me explain the seg-fault.
Partial eigensovler is important for dimension reduction. Big matrix can be approximated by low-rank matrix using the first k (to be specified) principal (based on the absolute value/magnitude of the eigenvalues) eigen-pairs. This could be realized by Krylov-subspace method or randomized linear algebra algorithms. I thought 'magma_ssyevdx' and its variants could do this job on GPU. However they seem not not based on partial eigensolver. It makes a significant difference in computation time to obtain the first few versus all eigen-pairs, otherwise I could have use skcuda.linalg.eig (which uses cuBALS/CULA) and output the first k pairs without bothering all the stuff. I am not sure whether MAGMA has implemented the partial eigensolver but I did not find it.
Anyway, thanks for all your kind support and I will move on for alternatives for my project.
No problem. Just want you to know that the testing routine from Magma itself (version 2.5.2, built from source) does work. I ran ./testing_ssyevd_gpu --lapack --version 2 --irange 10
in the testing
directory of Magma which calls the ssyevdx_gpu
function. It does run faster than cpu as listed below. It should just be an interface issue.
% MAGMA 2.5.2 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10000, driver 10020. OpenMP threads 12.
% device 0: GeForce GTX 1050 Ti with Max-Q Design, 1290.5 MHz clock, 4040.1 MiB memory, capability 6.1
% Sat Jan 18 19:15:28 2020
% Usage: ./testing_ssyevd_gpu [options] [-h|--help]
% jobz = No vectors, uplo = Lower, version = 2 (ssyevdx_gpu)
% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
%============================================================================
irange (1, 10)
1088 0.1465 0.1789 0.00e+00 --- --- ok
irange (1, 10)
2112 0.5888 0.6362 0.00e+00 --- --- ok
irange (1, 10)
3136 1.6521 1.1389 1.12e-10 --- --- ok
irange (1, 10)
4160 3.9832 1.8834 7.10e-11 --- --- ok
irange (1, 10)
5184 7.7379 2.9335 9.81e-10 --- --- ok
irange (1, 10)
6208 13.1770 4.5590 1.39e-11 --- --- ok
irange (1, 10)
7232 20.3966 6.8185 2.71e-10 --- --- ok
irange (1, 10)
8256 29.4747 9.5467 3.87e-10 --- --- ok
irange (1, 10)
9280 41.4837 12.6002 6.80e-11 --- --- ok
irange (1, 10)
10304 55.2372 16.6859 3.68e-11 --- --- ok
Thank you very much for the information! I also tested ‘ssyevd_gpu’ as you did and got about 3~4 speed up (interesting~) I will see if I can make magma_ssyevd_gpu work and let you know. Thanks!
On Jan 18, 2020, at 10:31 PM, Kit Lee notifications@github.com wrote:
No problem. Just want you to know that the testing routine from Magma itself (version 2.5.2, built from source) does work. I ran ./testing_ssyevd_gpu --lapack --version 2 --irange 10 in the testing directory of Magma which calls the ssyevdx_gpu function. It does run faster than cpu as listed below. It should just be an interface issue.
% MAGMA 2.5.2 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer. % CUDA runtime 10000, driver 10020. OpenMP threads 12. % device 0: GeForce GTX 1050 Ti with Max-Q Design, 1290.5 MHz clock, 4040.1 MiB memory, capability 6.1 % Sat Jan 18 19:15:28 2020 % Usage: ./testing_ssyevd_gpu [options] [-h|--help]
% jobz = No vectors, uplo = Lower, version = 2 (ssyevdx_gpu) % N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U| %============================================================================ irange (1, 10) 1088 0.1465 0.1789 0.00e+00 --- --- ok irange (1, 10) 2112 0.5888 0.6362 0.00e+00 --- --- ok irange (1, 10) 3136 1.6521 1.1389 1.12e-10 --- --- ok irange (1, 10) 4160 3.9832 1.8834 7.10e-11 --- --- ok irange (1, 10) 5184 7.7379 2.9335 9.81e-10 --- --- ok irange (1, 10) 6208 13.1770 4.5590 1.39e-11 --- --- ok irange (1, 10) 7232 20.3966 6.8185 2.71e-10 --- --- ok irange (1, 10) 8256 29.4747 9.5467 3.87e-10 --- --- ok irange (1, 10) 9280 41.4837 12.6002 6.80e-11 --- --- ok irange (1, 10) 10304 55.2372 16.6859 3.68e-11 --- --- ok — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lebedov/scikit-cuda/issues/289?email_source=notifications&email_token=AAXGXDAHABXKOIE3GWIY573Q6PQTRA5CNFSM4KHBFZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKJO6I#issuecomment-575969145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGXDDVTFER4NO2EIH745TQ6PQTRANCNFSM4KHBFZPQ.
@lanzithinking I think there was one incorrect argument in our interface above (and caused the seg fault): m
is an (pure) output and needs to use ctypes.byref()
like the info
variables. I have fixed it in my issue branch
You can also check out my demo
Below is a sample output for a 4000x4000 matrix (python demo_ssyedx_m.py 4000
):
k = 2000, iu = 2000
First 10 eigenvalues
GPU: [-36.561012 -36.410717 -36.17795 -36.082924 -35.971615 -35.826862
-35.766315 -35.69314 -35.603767 -35.56919 ]
CPU: [-36.560974 -36.41073 -36.178 -36.082973 -35.97165 -35.826904
-35.76632 -35.69318 -35.60378 -35.569195]
Time
GPU: 2.268606185913086
CPU: 9.111494064331055
I will probably step down from here and slowly add back some codes and tests. Thanks for brining this up, I learned a few things too.
@wingkitlee0 Thanks! Great! After fixing that bug. Both ssyevdx and dsyevdx (and also ssyevdx_m and dsyevdx_m) work! New testing results:
Neig - NumPy.linalg.eig (general, full); Neigh - Numpy.linalg.eigh (symmetric, full) Seigs - Scipy.sparse.linalg.eigs (general, partial); Seigsh - Scipy.sparse.linalg.eigsh (symmetric, partial)
Keig - scuda.linalg.eig (general, full, cuBLAS/CULA); Xeigs - dsyevdx (symmetric, full, MAGMA)
N=4000, k=100: Neig (CPU) time: 36.76619; Neigh (CPU) time: 4.49761 Seigs (CPU) time: 3.23999; Seigsh (CPU) time: 2.69280 (inconsistent result) Keig (GPU) time: 27.20364 Xeigs (GPU) time: 5.06100
N=4000, k=1000: Neig (CPU) time: 37.00435; Neigh (CPU) time: 4.29299 Seigs (CPU) time: 222.82142; Seigsh (CPU) time: 60.62667 (inconsistent result) Keig (GPU) time: 27.96665 Xeigs (GPU) time: 4.98696
N=4000, k=2000: Neigh (CPU) time: 4.02288 Seigsh (CPU) time: 208.90013 (inconsistent result) Keig (GPU) time: 27.79001 Xeigs (GPU) time: 5.35786
N=10000, k=2000: Neigh (CPU) time: 47.77205 Xeigs (GPU) time: 12.94744
interesting~
magma.py seems not well maintained. Is it used in any function in skcuda?
The magma bindings are not really used in skcuda's high-level math functions at the present. This is primarily due to very limited dev time on my part (my current work doesn't require skcuda) and the fact that there still doesn't seem to be a sufficiently good way to automate maintenance of the increasing number of low-level library wrappers in skcuda.
@wingkitlee: Do you have plan to support partial eigensolvers for symmetric/hermitian matrix, routines under 'heevdx' of MAGMA? I see there is only one function 'magma_ssyevdx_m' supported; however, several parameters, 'vl, vu, il, iu, m', are missing when you call MAGMA library in the master branch. Do you plan to include more functions, e.g. 'magma_dsyevdx' etc., in this category? Thanks!