ulissigroup / amptorch

AMPtorch: Atomistic Machine Learning Package (AMP) - PyTorch
GNU General Public License v3.0
59 stars 35 forks source link

Unable to use training dataset with images of varying numbers of atoms #89

Closed EricMusa closed 3 years ago

EricMusa commented 3 years ago

I am attempting to train a model using images collected from an MD of a bare slab, and an MD of a supported cluster on the same slab. When the fingerprints are scaled by the FeatureScaler, a RuntimeError is thrown, saying that the size of the fingerprint tensors must be the same, but it indicates that the number of fingerprints for atoms in the bare-slab images are fewer than those for the supported cluster images (see error message below). ACSF descriptors for elements not present in the image are not calculated, leading to this mismatch in fingerprint sizes for images.

My current workaround for this is adding swapping the elements of some of the slab atoms in the fixed bottom layer with those that would be in the supported cluster. This is a messy fix, but it allows training a model with images of varying compositions. I think adding padding to the fingerprints to fill in the uncalculated descriptors would be the best solution.

fingerprint_mismatch_error.log fingerprint_mismatch_script.log

mshuaibii commented 3 years ago

This shouldn't be an issue. The way you define your elements is why you get the mismatch in fingerprints. You need to define all the elements in the descriptor and it will pad accordingly to elements that don't exist. For instance: low_res_elements=hi_res_elements = ["Au", "Pt", "Ag"]. At least this is the case for the default fingerprinting, I don't imagine it any different for the GaussianDescriptorSet...give it a try and let me know!

EricMusa commented 3 years ago

You are correct, it does work, I'd completely forgotten that I'd made the support atoms intentionally have fewer descriptors... Thank you!