Closed ipcamit closed 7 months ago
I have addresses all the conversations. Once we have resolved the remaining ones, I can push a commit addressing the above.
I have two more comments regarding docs and annotation.
parameter.py
what the purpose of some of the functions is, particularly those involved with parameters in different spaces. It is a bit clearer to me now that:
original
refers to the space without transformation transformed
refers to the space after applying a transform object default
means whatever space, if no transform object is provided, it will always be the original space; if a transform object is applied, it can be a bit complicated - the obtained parameter can be in either the original or the transformed space, depending on the sequence of methods a user calledIt would be very help to add such an explanation to the docstring of the class. At least the meanings of original
, transformed
, and default
. And then specify in the docs of the corresponding method what space the parameter is expected to be.
np.ndarray
. It would be ideal if we could annotate the shape of the array, using something like jaxtyping. I feel it is overkill for now. A good comprise could be specifying the expected shape for the array in the docstring. Ok. I am finishing all tests and required changes thoughout the project, but before I do here is what I am proposing:
We define two spaces:
Below the the skeleton of changes I made. Please comment. now all needed functions wil have suffixes _model_space
and _param_space
to indicate which space they work on.
class Parameter(np.ndarray):
"""Parameter class for containing physics-based model parameters.
This class provides utilities for managing model parameters between the "model space"
and the "transformed space". See the glossary below for the definition of these spaces.
Modeled on `torch.nn.Parameters`, it inherits from `numpy.ndarray`. It is a numpy
array with additional attributes such as name, transform, etc. For maintaining compatibility,
use `get_numpy_array` (for getting a numpy array of parameters) and `get_opt_numpy_array`
(transformed numpy array but with only the optimizable values). `get_numpy_array` array
also helps in getting an instance of the Parameter to mutate safely, and pass it to
any impure function.
Glossary:
- Model space: The space in which the model expects the parameters to be in. Currently,
all models in OpenKIM expect the parameters to be in the affine cartesian space.
- Parameter space: Parameter space is the space in which you want to sample/optimize
yout parameters. Most often parameters are transformed using bijective transformations
of the model inputs for ease of optimization/sampling. For example, the log
transform is used in searching for the optimal parameters of sloppy model parameters.
There can be cases where transformation functions are not bijective, e.g. ceiling function for
mapping continuous parameters to discrete values. Parameter space is mostly
used for optimization, and not the model itself. If no transform_function is
provided then parameter space and model space are same.
All functions that needs input/output in the model space, will use the suffix
`_model_space` and `_param_space` for the transformed parameter space.
Attributes:
name : Name of the parameter.
transform : Instance of ``ParameterTransform`` object to be applied to the parameter.
index : Index of the parameter in the parameter vector. used for setting the parameter
in the KIMPY.
bounds : Bounds for the parameter, must be numpy array of shape n x 2, with [n,0] as
lower bound, and [n,1] as the upper bound. If None, no bounds are applied.
is_mutable : If True, the parameter is trainable, and will be passed to scipy optimizer.
If False, the parameter is not trainable, and kept constant.
opt_mask : A boolean array of the same shape as the parameter. If True, the
parameter is optimized. If False, the parameter is not optimized.
"""
def __new__(
cls,
input_array: Union[np.ndarray, float, int],
name: str = None,
transform_function: "ParameterTransform" = None,
bounds: np.ndarray = None,
is_mutable: bool = False,
index: int = None,
opt_mask: np.ndarray = None,
):
"""Initializes and returns a new instance of Parameter.
Args:
input_array: Input numpy array to initialize the parameter with.
name: Name of the parameter
transform_function: Instance of ``ParameterTransform`` object to be applied to the parameter.
bounds: n x 2 array of lower and upper bounds for the parameter. If None, no
bounds are applied
is_mutable: boolean if the parameter is mutable. This implies that either
the parameter will be passed to the scipy optimizer, or the parameter values will be edited.
index: Index of the parameter in the parameter vector. Used for setting the
parameter in the KIMPY.
opt_mask: Boolean array of the same shape as the parameter. The values
marked ``True`` are optimized, and ``False`` are not optimized.
Returns:
A new instance of Parameter.
"""
def transform(self):
"""Apply the transform to the parameter.
This method simple applies the function ~:kliff.transforms.ParameterTransform.__call__
to the parameter (or equivalently, ~:kliff.transforms.ParameterTransform.transform).
"""
def inverse_transform(self):
"""Apply the inverse transform to the parameter.
def copy_from_param_space(self, arr: np.ndarray):
"""Copy array to self in the parameter space.
Array can be a numpy array or a Parameter object.
This method assumes that the array is of the same type and shape as self,
compensated for opt_mask. If not, it will raise an error.
This method also assumes that the incoming array is in the same space, as the parameter
currently (i.e. "Parameter space", see glossary above for detail).
Args:
arr: Array to copy to self.
"""
def copy_from_model_space(self, arr: np.array):
"""Copy arr from model space.
Array can be a numpy array or a Parameter object. This method assumes that the
incoming array is in the model space and would need transformation to the parameter
space before copying. It is a safer method to use in most cases. If the parameter
is not transformed, it will transform it first for maintaining consistency.
Args:
arr: Array to copy to self.
"""
def get_numpy_array_model_space(self) -> np.ndarray:
"""Get a numpy array of parameters in the model space.
This method should be uses for getting the numpy array of parameters where the
``Parameters`` class might not work, for feeding values to the model.
Returns:
A numpy array of parameters in the original space.
"""
def get_numpy_array_param_space(self):
"""Applies the transform to the parameter, and returns the transformed array."""
def get_opt_numpy_array_param_space(self) -> np.ndarray:
"""Get a masked numpy array of parameters in the transformed space.
This method is similar to :get_numpy_array_param_space but additionally does apply the
opt_mask, and returns the array. This ensures the correctness of the array for
optimization/other applications. *This should be the de-facto method for getting
the numpy array of parameters.*
Returns:
A numpy array of parameters in the original space.
"""
def copy_at_param_space(self, index: Union[int, List[int]], arr: Union[int, float, np.ndarray]):
"""Copy values at a particular index or indices in parameter space.
This method directly copies the provided data, and does not perform any checks.
Args:
index: Index or indices to copy the values at.
arr: Array to copy to self.
"""
def add_transform(self, transform: "ParameterTransform"):
"""Save a transform object with the parameter.
Args:
transform: Instance of ``ParameterTransform`` object to be applied to the parameter.
"""
def add_bounds_model_space(self, bounds: np.ndarray):
"""Add bounds to the parameter.
Bounds should be supplied in the model space. The bounds will be transformed if
the transform_function is provided to the parameter.
Args:
bounds: numpy array of shape (n, 2)
"""
def add_bounds_param_space(self, bounds: np.ndarray):
"""Add bounds to the parameter.
Add bounds to the parameter in parameter space. It does not do any additional checks
or perform any transformations.
Args:
bounds: numpy array of shape (n, 2)
"""
def add_opt_mask(self, mask: np.ndarray):
"""Set mask for optimizing vector quantities.
It expects an input array of shape (n,), where n is the dimension of the vector
quantity to be optimized. This array must contain n booleans indicating which
properties to optimize.
Args:
mask: boolean array of same shape as the vector quantity to be optimized
"""
def get_bounds_param_space(self) -> List[Tuple[int, int]]:
"""Returns bounds array that is used by scipy optimizer.
Returns:
A list of tuples of the form (lower_bound, upper_bound)
"""
def get_bounds_model_space(self) -> np.ndarray:
"""Get the bounds in the original space.
Returns:
A numpy array of bounds in the original space.
"""
def has_bounds(self) -> bool:
"""Check if bounds are set for optimizing quantities
Returns:
True if bounds are set, False otherwise.
"""
def as_dict(self):
"""Return a dictionary containing the state of the object."""
def save(self, filename):
"""Save the parameter to disk."""
@classmethod
def from_dict(cls, state_dict):
"""Update the object's attributes based on the provided state dictionary.
Args:
state_dict (dict): The dictionary containing the state of the object.
This dictionary should include the "value" key.
"""
@classmethod
def load(cls, filename):
"""Load a parameter from disk.
TODO: non classmethod version
"""
def get_opt_param_name_value_and_indices(
self,
) -> Tuple[str, Union[float, np.ndarray], int]:
"""Get the name, value, and indices of the optimizable parameters.
Returns:
A tuple of lists of names, values, and indices of the optimizable parameters.
"""
@property
def lower_bound(self):
"""Get the lower bounds of the parameter.
Always returns values in parameter space.
Returns:
A numpy array of lower bounds.
"""
@property
def upper_bound(self):
"""Get the upper bounds of the parameter.
Always returns values in parameter space.
Returns:
A numpy array of upper bounds.
This is much more clean!
A couple of questions/comments:
OpenKIM expect the parameters to be in the affine cartesian space
. What do you mean by affine cartesian space
? is_mutable
and opt_mask
: if is_mutable=False
, all components of the parameter should not be changed. However, is_mutable=True
, the components that can be changed are determined by opt_mask
... Well, I think we might want to remove is_mutable
as the argument of the class and overload opt_mask
like:
opt_mask: Union[np.ndarray, bool] = None,
if opt_mask=True
, all components are optimizable, and internally and internally we convert it to an array of bools. If opt_mask=False
, we do it the other way around. We may still want to keep a function (we can decorate with @property
, I'm ok with either):
def is_mutable(self):
return np.all(self.opt_mask)
def copy_at_param_space
, can we change the order of the arguments arr
and index
? Also, do we need a def copy_at_model_space
?
- OpenKIM expect the parameters to be in the affine cartesian space. What do you mean by affine cartesian space?
I just meant simple real 3D euclidian space. I thought affine cartesian space would be more unambiguous. May should just write eucledian instead.
if opt_mask=True, all components are optimizable, and internally and internally we convert it to an array of bools.
def is_mutable(self): return np.all(self.opt_mask)
That sounds like a good idea, will give it a try. Just a minor comment, you meant np.any(self.opt_mask)
right? For partially optimizable arrays.
For def copy_at_param_space, can we change the order of the arguments arr and index? Also, do we need a def copy_at_model_space ?
That was added to give ease of adding data in any order. Otherwise user have to know which space to copy data in, and it will become non-commutative operation with add_transform
, i.e. copying same value before adding the transform and afterwards will result two different outcomes. Therefore two separate methods provide unambiguous way of copying the data.
See also our previous discussions on similar issue regarding bounds
here
I just meant simple real 3D euclidian space.
What does it mean for a parameter to be in 3D euclidean space? I would think it is just a vector space, and the dimension depends on the number of components. Anyway, probably we can simply use the model space
and parameter space
, without these adjectives?
you meant
np.any(self.opt_mask)
right? For partially optimizable arrays.
ah, right!
For def copy_at_param_space, can we change the order of the arguments arr and index? Also, do we need a def copy_at_model_space ?
To clarify, I saw copy_at_param_space
included in your proposed new Parameter
class, but not copy_at_model_space
. I was asking to add the latter.
Let me gather my thoughts on that and get back to you. I just meant distinguish between transformed space model space. By Euclidean or affine space I meant that all the parameters have dimensions and values that are combination of quantities in si or metal units etc. While transformed space will be combination of them in some other space with dimension less quantities ideally. Does that make sense?
Regarding copy_at_model
I guess that was not there as it is not used
anywhere in the code and we always work in transformed space. But will
include it in next draft.
On Wed, Dec 13, 2023, 23:16 Mingjian Wen @.***> wrote:
I just meant simple real 3D euclidian space. What does it mean for a parameter to be in 3D euclidean space? I would think it is just a vector space, and the dimension depends on the number of components. Anyway, probably we can simply use the model space and parameter space, without these adjectives?
you meant np.any(self.opt_mask) right? For partially optimizable arrays. ah, right!
For def copy_at_param_space, can we change the order of the arguments arr and index? Also, do we need a def copy_at_model_space ?
To clarify, I saw copy_at_param_space included in your proposed new Parameter class, but not copy_at_model_space. I was asking to add the latter.
— Reply to this email directly, view it on GitHub https://github.com/openkim/kliff/pull/140#issuecomment-1855146419, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTZ2DY4MMW5DTYIL5BGEX3YJKDUPAVCNFSM6AAAAABAGRVDV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVGE2DMNBRHE . You are receiving this because you authored the thread.Message ID: @.***>
Alright, clear now.
Added changes as per above discussion. Happy holidays!
Thanks @ipcamit !
The explicit use of _model_space
and _param_space
makes it much clear to know which is which.
I don't think any of the comments are major - most of them are clarifying questions.
Some of the comments (the early ones) are from my responses in the previous review. Please ignore them (I don't know why they appear).
I did not take a look at kliff_trainer.py
. I think it might be included by accident (we will do it in the next PR)?
Pushed latest changes as per comments. Please ignore the kliff_trainer
it got tracked by accident.
@ipcamit I've left two or three small comments; can you please address these?
Also, can you remove the kliff_trainer.py
file?
Then, I think it is ready to be merged.
Removed, trainer paths, and fixed opt_mask
annotations. Let me know if I missed any comments, as I could not see any newer comments.
I saw the tests are failing; these are the UQ tests. Are we planning to deal with UQ stuff later, and don't worry about it now? If that is the case, I think we can merge it now.
I think a lot of UQ stuff needs to be rewritten. So ok for me to ignore the UQ tests.
Yes. UQ I would deal with later. I have made it work earlier, but with changes in function names, members, and certain functionalities, I would need to redo it. My plan is to integrate coordinate transforms next, including interface with libdescriptor and graph computing etc. as I need it immediately for other works. Right now I need to maintain two parallel versions of KLIFF for the same. After coordinate transforms, I could switch full time to KLIFF 1.0 "beta" version. UQ would then be next, followed by trainers.
This particular set of changes is the most vats and intrusive one, so I think easier will be for you to break it down in smaller incomplete commits and review them before merging in one big PR. I have split the total changes in 3 categories
2, and 3 will be submitted post review of 1.
Changes in this commit
OptimizingParameters
, now parameters haveis_mutable
flag which indicates if this parameter can be changed during optimization.transformations
module is introduced, which will also hold coordinate and property transformations in future.__call__
method, which can directly acceptConfiguration
object and returns a dictionary of energy forces and stresses.Parameters
will transform parameters now.transforms
: New module for collecting all transformations. Currently it only contains ParameterTransformsmodels
to reflect the newer API.Let me know of suggestions. Once we have no actionable item on this, I will submit the part 2 of this PR.