Closed PhilippThoelke closed 2 years ago
I wouldn't say that it doesn't make sense. For certain purposes, having access to charge information can be useful. The dataset includes two types of charges: MBIS charges (produced by the QC calculations) and formal charges (encoded in the SMILES strings). Some models might make use of formal charges as an input. MBIS charges generally aren't useful as an input, since you won't know them for new molecules or conformations, but they might be useful as a prediction target. Of course, some models won't use either.
Ok so we could potentially decode formal charges from the supplied SMILES strings in the dataset class.
As suggested (https://github.com/openmm/spice-dataset/issues/42), the formal charges should be the HDF5 files directly. So, the loader can load them without parsing SMILES. Otherwise, TorchMD-NET will have a dependency of RDKit, which we would like to avoid.
Is there currently a way to access charges from the SPICE dataset class? As the code already supports passing charges to the model (through the
q
parameter), wouldn't it make sense to utilize that for SPICE? As I understand it doesn't make much sense to train models on SPICE without encoding the charge? @raimis @peastman