This PR contains some trivial performance (speed) fixes for Dataset in Sparsechem.
It avoids the creation of temporary sparse matrices for every element in the minibatch in the __getitem__ method of the Dataset als used by PyTorch's DataLoader.
The binary labels are transformed from {-1, 1} to {0, 1} once when creating the Dataset instead of getting transformed for every minibatch anew.
These fixes provide a performance improvement of up to 4x for folded inputs. 99% of the gain is from the elision of sparse temporaries.
This PR contains some trivial performance (speed) fixes for Dataset in Sparsechem.
__getitem__
method of theDataset
als used by PyTorch'sDataLoader
.Dataset
instead of getting transformed for every minibatch anew.These fixes provide a performance improvement of up to 4x for folded inputs. 99% of the gain is from the elision of sparse temporaries.