Open n-gao opened 1 week ago
@n-gao Thank you for your feedback. The bug is fixed in https://github.com/pyscf/gpu4pyscf/pull/212. However, it is not necessary faster than its implementation on CPU. Mostly, we are still using int2c2e
on CPU. It probably can be much faster than CPU for large systems with further optimization.
And I moved int2c2e
to avoid possible confusions. Here is some test script.
import cupy
import pyscf
from pyscf.df.addons import make_auxmol
from gpu4pyscf.scf.int2c2e import get_int2c2e
m = pyscf.gto.M(atom='''
C -1.1367 0.0103 0.0000
C 0.1372 -0.0024 0.0000
C 1.0258 1.2064 0.0000
C 2.3997 1.1549 0.0000
O 2.9930 0.0003 0.0000
O 3.0209 2.3217 0.0000
C 1.0136 2.4426 0.0000
C -0.3603 2.4961 0.0000
H -2.0364 -0.6442 0.0000
H 1.7805 -0.5956 0.0000
H -2.0216 0.6637 0.0000
H 0.0660 -1.0129 0.0000
O -0.5710 3.7755 0.0000
C 3.9200 0.9435 0.0000
H -1.2845 3.9715 0.0000
H 4.5367 1.2207 0.0000
''', basis='6-31G(d)')
aux_mol = make_auxmol(m, 'weigend')
start_event = cupy.cuda.Event()
end_event = cupy.cuda.Event()
start_event.record()
for i in range(100):
int2c2e = get_int2c2e(aux_mol)
end_event.record()
end_event.synchronize()
elapsed_time = cupy.cuda.get_elapsed_time(start_event, end_event)
print(f'{elapsed_time/1000:.3f} s', 'with GPU')
start_event = cupy.cuda.Event()
end_event = cupy.cuda.Event()
start_event.record()
for i in range(100):
int2c2e = aux_mol.intor('int2c2e_sph')
int2c2e = cupy.asarray(int2c2e)
end_event.record()
end_event.synchronize()
elapsed_time = cupy.cuda.get_elapsed_time(start_event, end_event)
print(f'{elapsed_time/1000:.3f} s', 'with CPU')
It appears that the
int2c2e
function is not working properly. The following code doesn't workError
Python 3.12.5 gpu4pyscf-cuda12x 1.0.2 gpu4pyscf-libxc-cuda12x 0.5 pyscf 2.6.2 numpy 1.26.4 cupy-cuda12x 13.3.0 cudatoolkit 12.5.0