Open EwoutH opened 2 months ago
My initial idea playing with this is something like:
# Define dataclass somewhere (called ParameterSpec in this case)
from dataclasses import dataclass
@dataclass
class ParameterSpec:
default: any
min_val: any = None
max_val: any = None
step: any = None
# Modify the Mesa Model to initialize this ParameterSpec's default value,
# if no other value is passed
class Model():
def __init__(self, **kwargs):
....
# Iterate through each attribute defined in the class
for key, value in self.__class__.__dict__.items():
if isinstance(value, ParameterSpec):
# Set the value from kwargs or use the default from ParameterSpec
setattr(self, key, kwargs.get(key, value.default))
# Allow the users to define class-level variables in ParameterSpec form
class MyModel(BaseModel):
wealth_spec = ParameterSpec(100, min_val=50, max_val=150)
area_spec = ParameterSpec(10, min_val=5, max_val=20)
def __init__(self, **kwargs):
super().__init__(**kwargs) # Calls BaseModel.__init__
# Additional initialization code here
# The model can now be initialized without any input values
model_default = MyModel()
print(model_default.wealth) # Output will be 100
print(model_default.area) # Output will be 10
# But they can also be easily overwritten
model_custom = MyModel(wealth=120, area=15)
print(model_custom.wealth) # Output will be 120
print(model_custom.area) # Output will be 15
The only thing we now required is the super().__init__(**kwargs)
to be called with **kwargs
.
Now:
Edit: **kwargs
only needs to be inputted if one or more ParameterSpec
instances are defined. So for existing models nothing changes.
I really like the basic idea! I think one of the challenges is to catch all basic parameter types. You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.
Of course theoretical any python can be an input parameter. Maybe there is an elegant way of handling this?
Also for numerical values, not the highest priority, but maybe already think about how we could handle non-linear scaling (maybe you have a range between 1e3 and 1e6 or the like)
But will be super useful
Thanks for you insights! I was also churning om some similar things
You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.
We could add a parameter other_values
for strings, booleans, objects, etc. Then that could just be an (unordered) list of options.
Maybe an explicit parameter_type
would also be useful. Or subclass ParameterSpec
to ParameterSpecInt
, ParameterSpecBool
, etc. (better names needed). We might be able to take some inspiration from the EMAworkbench's parameters
.
how we could handle non-linear scaling
Also thinking about this. Ideally, step_size
could not only be a fixed step, but also a multiplier or even exponent. Maybe a scaling
parameter could be useful that has options for linear, logarithmic, exponential, quadratic, etc.
Finally, how do you see this integrating with visualisation? It might remove a lot of the boilerplate you have to write in app.py
. Any problems that might occur with this approach?
For non-numeric, it might just be able to be a drop-down menu or something like that.
I like the idea. Drawing on my experience with the workbench, you would need something like the following:
RealParameter
in the workbenchIntegerParameter
IntegerParameter
CategoricalParameter
Booleans can be useful, but in the workbench they are are subclassed from IntegerParameter
.
Subclassing from ParameterSpec might be the best idea. It forces the user to be explicit about the nature of each parameter and how it can be handled. In particular, ordered vs. unordered sets is an important distinction.
I would be hesitant to include the step_size
. At least from an experimental design point of view, this is not a property of the parameter space but a choice by the analyst in how she wants to sample points from the parameter space.
What I’m also curious about, if we encounter this problem, and the workbench does, is it encountered by other simulation libraries? How do they solve it? Should there be a general (Python) solution?
https://scientific-python.org/specs/ might be the place if we want to take the high-effort route.
Let's not wait for that if we think this is a useful idea of MESA. If a default SPEC emerges for this, we might choose to start following that. I doubt, however, that it will come because the nature of ABMs is quite different from many other simulation problems in that for those you typically only need real parameters.
We could start making sure it’s compatible with the workbench.
I would be hesitant to include the
step_size
.
Decoupling sampling strategies from the parameter ranges seems to be a good idea indeed.
This is a great idea!
For reference, Optuna (a hyperparameter optimization framework) uses a similar approach for defining parameter search spaces. See an example here: Optuna Pythonic Search Space
What if we allowed SciPy distributions for numerical parameters? This could be particularly beneficial when running batch_run
. You could get more informative results if you don't do a full sweep search and instead specify a number of runs, especially if you know certain parameters follow a specific distribution.
Okay, I thought about this a bit more.
Basically we have the four main for data types:
float
)int
)Any
)Any
)You could add more details with Stevens's typology of level of measurement, but (for now) I don't think that's necessary. Boolean is a bit weird, because it can basically be a special case of either ordered or categorial data.
So what's the bare minimum you need to sample each?
Taking these two, we can already observe they align quite well. They can probably be one class, with an boolean attribute for ordered
.
float
)These two also are similar, and probably can be grouped in a class. That means we would have something like a NumericalParameters
and CategoricalParameters
subclass.
@dataclass
class ParameterSpec:
"""Base class for parameter specifications."""
description: str = ""
@dataclass
class NumParamSpec(ParameterSpec):
"""Represents numerical parameters, including continuous and discrete types."""
min_val: float | None = None
max_val: float | None = None
is_discrete: bool = False # False means continuous (sampler should return float), True means discrete (should return int)
distribution_type: str | None = None # Optional field to specify the distribution type
distribution_parameters: dict | None = None # Parameters for the distribution (e.g., mean, std)
@dataclass
class CatParamSpec(ParameterSpec):
"""Represents categorical parameters, both ordered and unordered sets."""
categories: List[Any]
probabilities: List[float] | None = None # Probabilities for each category, if any
is_ordered: bool = False # False means unordered, True means ordered
So, I think this satisfices sampling. Note that distribution_parameters
could contain things like loc
, scale
and shape
, or other variables necessary.
Then there's the "practical modelling" size. I see basically three important use cases here:
default
key could be useful in all cases.clip
using the min and max value, or allow then outside this range. This could be a boolean.bins
, bin_size
step_size
might be useful to easily create sliders (for input) and plots (for output). Knowing if the parameters is meant to scale linear, logarithmic, exponential or otherwise might also be useful to know.My thoughts so far. Curious what everybody thinks!
Some resources linked from @tupui and @ConnectedSystems over in https://github.com/SALib/SALib/issues/634:
Might be interesting if we can learn something from them!
Okay, to move forward:
What does everybody think? What am I missing or should be different?
CC @tupui and @ConnectedSystems
Let's centralize the discussion to Scientific Python, so we can get all ideas in one place:
I will make a little introduction there with my thoughts on the problem from the perspective of Mesa. I would love the same from other maintainers from other libraries!
@EwoutH Since I saw you are working on a new batch-runner, let me just very quickly outline some of the possibilities with param
.
So first of all, this is how you could specifiy parameters with param
class MyModel:
n = param.Integer(10, bounds=(0, None), soft_bounds=(0, 100), step=5, help="Number of agents")
That is we define the number of agents with a default value (10), some hard bounds (must be positive, raises an exception if not), some soft bounds (should be betwenn 0 and 100, e.g. can be picked up by a slider), some step size (not enforced, but can be used for parameter sweeps or again a GUI), and a short help text.
Now the interesting part relating to batch running is that besides setting this value directly (model.n = 50
), we can also set it to a function that resolves to an integer (model.n = random.randint(0, 100)
). This of course allows to easily do parameter sweeps. They even provide a whole ranger of numbergen
functions that sample from different distributions. So that might be worth considering
https://param.holoviz.org/user_guide/Dynamic_Parameters.html
Thanks, I think param
is almost exactly what this issue was intended to produce. Turns out it already exists, which saves so much time.
I’m going to try to integrate it and see how it works.
Python has a typing.ParamSpec
class, which as of Python 3.13 now can also have a default value. Might be interesting to look further into, and maybe subclass.
Currently, I need to specify default values and sometimes ranges on multiple places:
min
,max
,step
and a defaultvalue
batch_run
I'm defining discrete ranges of variablesIf seems like Mesa could benefit from some way to define default values and/or ranges in a better way that can be used throughout different components.