torchmd / torchmd-net

Training neural network potentials
MIT License
326 stars 73 forks source link

update hdf5 dataset #292

Closed guillemsimeon closed 7 months ago

guillemsimeon commented 7 months ago

Update hdf5 dataset such that now charges, spin and partial charges can be included at the same time. Also, renamed partial charges to 'pq' to be consistent with other datasets such as Ace.

RaulPPelaez commented 7 months ago

LGTM thanks.

peastman commented 7 months ago

This PR just broke the Coulomb prior.

https://github.com/torchmd/torchmd-net/blob/ea8664f09876c683bc174a8aae42b256e29eb489/torchmdnet/priors/coulomb.py#L33-L34

peastman commented 7 months ago

Also, as much as possible I'd like to avoid renaming fields from the hdf5 file. One of the goals of #289 is that you can add arbitrary extra fields to the dataset and then specify ones to pass on to the model. Supporting that is one of the next things I'll do once it's merged. Having special fields that unexpectedly get renamed makes it more difficult for the user. Whenever possible the field in the dataset should have the same name as the entry in the file. It's probably too late to change that for energy and forces, but let's at least not add any new fields with magic names.

guillemsimeon commented 7 months ago

Sorry Peter, I did not realize. You can fix the name for partial charges again if you want, but in Ace we used ‘pq’, which I think it is more consistent with abbreviate names such as ‘q’ and ‘s’.

On Fri, 23 Feb 2024 at 19:17, Peter Eastman @.***> wrote:

This PR just broke the Coulomb prior.

https://github.com/torchmd/torchmd-net/blob/ea8664f09876c683bc174a8aae42b256e29eb489/torchmdnet/priors/coulomb.py#L33-L34

— Reply to this email directly, view it on GitHub https://github.com/torchmd/torchmd-net/pull/292#issuecomment-1961786979, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANJMOAZCDXEPVXABYJUG6XDYVDMNJAVCNFSM6AAAAABDWRC7U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRG44DMOJXHE . You are receiving this because you authored the thread.Message ID: @.***>

peastman commented 7 months ago

Thanks, I'll make a PR. I'd actually prefer to get away from abbreviations like that. I'm currently thinking about training models where I feed in multiple types of charges (for example, formal charges and Gasteiger partial charges). Add in total charge which you've said you want, and now you have three different types of charge. Which one is q? Using explicit names like partial_charge, formal_charge, and total_charge seems clearer.

guillemsimeon commented 7 months ago

Yes, I agree. I am biased towards the way things are written now because I have always been exposed to this names, but it is true that for an external user it is more difficult. Feel free to rename also q to total charge and s to spin in the PR.

On Fri, 23 Feb 2024 at 19:36, Peter Eastman @.***> wrote:

Thanks, I'll make a PR. I'd actually prefer to get away from abbreviations like that. I'm currently thinking about training models where I feed in multiple types of charges (for example, formal charges and Gasteiger partial charges). Add in total charge which you've said you want, and now you have three different types of charge. Which one is q? Using explicit names like partial_charge, formal_charge, and total_charge seems clearer.

— Reply to this email directly, view it on GitHub https://github.com/torchmd/torchmd-net/pull/292#issuecomment-1961811299, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANJMOA34JWIK4IQEYIVKESTYVDOUDAVCNFSM6AAAAABDWRC7U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHAYTCMRZHE . You are receiving this because you authored the thread.Message ID: @.***>

RaulPPelaez commented 7 months ago

I agree completely with Peter, also good luck looking for "s" in a file.