lanl / hippynn

python library for atomistic machine learning
https://lanl.github.io/hippynn/
Other
59 stars 22 forks source link

Added database options to remove outlier data #30

Closed bnebgen-LANL closed 1 year ago

bnebgen-LANL commented 1 year ago
def remove_high_property(self,key,perAtom,species_key=None,cut=None,std_factor=10):
    """
    This function removes outlier data from the dataset
    Must be called before splitting
    "key": the property key in the dataset to check for high values
    "perAtom": True if the property is defined per atom in axis 1, otherwise property is treated as full system
    "std_factor": systems with values larger than this multiplier time the standard deviation of all data will be reomved. None to skip this step
    "cut_factor": systems with values larger than this number are reomved. None to skip this step. This step is done first.
    """