Performance Improvements

marvinfriede commented 1 year ago

[x] profile code and identify slow parts
[ ] improve bottlenecks

Some problems and potential code pieces that can be improved were already found:

[ ] loop over unique pair to find correct position in full overlap matrix (https://github.com/grimme-lab/xtbML/pull/107)
[x] loop over primitives in overlap gradient (https://github.com/grimme-lab/xtbML/pull/108#discussion_r1126274610)
[x] loop over position vector in overlap gradient (https://github.com/grimme-lab/xtbML/pull/108#discussion_r1126281350)

marvinfriede commented 1 year ago

I tried to use torch.nn.functional.pad but did not find a way to circumvent the costly loop, because the padding function only takes tuples and cannot be vectorized either.

Code

Current implementation (this is the slow loop): ```python for r, pair in enumerate(upairs): ovlp[ pair[0] : pair[0] + norbi, pair[1] : pair[1] + norbj, ] = stmp[r] return ovlp ``` Version with padding instead of indexing, which is much slower as it does not remove the loop: ```python l = ovlp.shape[0] padlist = ( upairs[:, 1], l - upairs[:, 1] - norbj, upairs[:, 0], l - upairs[:, 0] - norbi, ) for r in range(upairs.shape[0]): pad = ( padlist[0][r], padlist[1][r], padlist[2][r], padlist[3][r], ) ovlp += torch.nn.functional.pad( stmp[r], pad, "constant", 0.0, ) return ovlp ```

hoelzerC commented 1 year ago

loop over primitives in overlap gradient (https://github.com/grimme-lab/xtbML/pull/108#discussion_r1126274610) loop over position vector in overlap gradient (https://github.com/grimme-lab/xtbML/pull/108#discussion_r1126281350)

Done [1].

grimme-lab / dxtb

Performance Improvements #110