Access SPICE charges - Githubissues

torchmd / torchmd-net

Training neural network potentials

MIT License

335 stars 75 forks source link

Access SPICE charges #146

Closed PhilippThoelke closed 2 years ago

PhilippThoelke commented 2 years ago

Is there currently a way to access charges from the SPICE dataset class? As the code already supports passing charges to the model (through the q parameter), wouldn't it make sense to utilize that for SPICE? As I understand it doesn't make much sense to train models on SPICE without encoding the charge? @raimis @peastman

peastman commented 2 years ago

I wouldn't say that it doesn't make sense. For certain purposes, having access to charge information can be useful. The dataset includes two types of charges: MBIS charges (produced by the QC calculations) and formal charges (encoded in the SMILES strings). Some models might make use of formal charges as an input. MBIS charges generally aren't useful as an input, since you won't know them for new molecules or conformations, but they might be useful as a prediction target. Of course, some models won't use either.

PhilippThoelke commented 2 years ago

Ok so we could potentially decode formal charges from the supplied SMILES strings in the dataset class.

raimis commented 2 years ago

As suggested (https://github.com/openmm/spice-dataset/issues/42), the formal charges should be the HDF5 files directly. So, the loader can load them without parsing SMILES. Otherwise, TorchMD-NET will have a dependency of RDKit, which we would like to avoid.