A system for managing Model/Agent default values and ranges

EwoutH commented 2 months ago

Currently, I need to specify default values and sometimes ranges on multiple places:

As default model variables (for examples etc.)
If I want to visualize, I need to define a min, max, step and a default value
In batch_run I'm defining discrete ranges of variables
Our benchmarks also have custom parameters.
If I want to run in something like the EMAworkbench I'm again defining ranges

If seems like Mesa could benefit from some way to define default values and/or ranges in a better way that can be used throughout different components.

EwoutH commented 2 months ago

My initial idea playing with this is something like:

# Define dataclass somewhere (called ParameterSpec in this case)
from dataclasses import dataclass

@dataclass
class ParameterSpec:
    default: any
    min_val: any = None
    max_val: any = None
    step: any = None

# Modify the Mesa Model to initialize this ParameterSpec's default value,
# if no other value is passed
class Model():
    def __init__(self, **kwargs):
        ....
        # Iterate through each attribute defined in the class
        for key, value in self.__class__.__dict__.items():
            if isinstance(value, ParameterSpec):
                # Set the value from kwargs or use the default from ParameterSpec
                setattr(self, key, kwargs.get(key, value.default))

# Allow the users to define class-level variables in ParameterSpec form
class MyModel(BaseModel):
    wealth_spec = ParameterSpec(100, min_val=50, max_val=150)
    area_spec = ParameterSpec(10, min_val=5, max_val=20)

    def __init__(self, **kwargs):
        super().__init__(**kwargs)  # Calls BaseModel.__init__
        # Additional initialization code here

# The model can now be initialized without any input values
model_default = MyModel()
print(model_default.wealth)  # Output will be 100
print(model_default.area)    # Output will be 10

# But they can also be easily overwritten
model_custom = MyModel(wealth=120, area=15)
print(model_custom.wealth)  # Output will be 120
print(model_custom.area)    # Output will be 15

The only thing we now required is the super().__init__(**kwargs) to be called with **kwargs.

Now:

The visualisation can use all 4 parameter spec numbers
batch_run can use min, max and step (by default)
The examples can use a default value
The EMAworkbench can use the min and max value

Edit: **kwargs only needs to be inputted if one or more ParameterSpec instances are defined. So for existing models nothing changes.

Corvince commented 2 months ago

I really like the basic idea! I think one of the challenges is to catch all basic parameter types. You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.

Of course theoretical any python can be an input parameter. Maybe there is an elegant way of handling this?

Also for numerical values, not the highest priority, but maybe already think about how we could handle non-linear scaling (maybe you have a range between 1e3 and 1e6 or the like)

But will be super useful

EwoutH commented 2 months ago

Thanks for you insights! I was also churning om some similar things

You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.

We could add a parameter other_values for strings, booleans, objects, etc. Then that could just be an (unordered) list of options.

Maybe an explicit parameter_type would also be useful. Or subclass ParameterSpec to ParameterSpecInt, ParameterSpecBool, etc. (better names needed). We might be able to take some inspiration from the EMAworkbench's parameters.

how we could handle non-linear scaling

Also thinking about this. Ideally, step_size could not only be a fixed step, but also a multiplier or even exponent. Maybe a scaling parameter could be useful that has options for linear, logarithmic, exponential, quadratic, etc.

Finally, how do you see this integrating with visualisation? It might remove a lot of the boilerplate you have to write in app.py. Any problems that might occur with this approach?

For non-numeric, it might just be able to be a drop-down menu or something like that.

quaquel commented 2 months ago

I like the idea. Drawing on my experience with the workbench, you would need something like the following:

continuous ranges, so, e.g., 1.0 - 12.5; this is a RealParameter in the workbench
integer ranges, so, e.g., 1 - 10; this is an IntegerParameter
ordered sets, so, e.g., A, B, C; this does not exist in the workbench but can be handled by IntegerParameter
categorical variables/unordered sets, so, e.g., A, 5, some_object, this is a CategoricalParameter

Booleans can be useful, but in the workbench they are are subclassed from IntegerParameter.

Subclassing from ParameterSpec might be the best idea. It forces the user to be explicit about the nature of each parameter and how it can be handled. In particular, ordered vs. unordered sets is an important distinction.

I would be hesitant to include the step_size. At least from an experimental design point of view, this is not a property of the parameter space but a choice by the analyst in how she wants to sample points from the parameter space.

EwoutH commented 2 months ago

What I’m also curious about, if we encounter this problem, and the workbench does, is it encountered by other simulation libraries? How do they solve it? Should there be a general (Python) solution?

EwoutH commented 2 months ago

https://scientific-python.org/specs/ might be the place if we want to take the high-effort route.

quaquel commented 2 months ago

Let's not wait for that if we think this is a useful idea of MESA. If a default SPEC emerges for this, we might choose to start following that. I doubt, however, that it will come because the nature of ABMs is quite different from many other simulation problems in that for those you typically only need real parameters.

EwoutH commented 2 months ago

We could start making sure it’s compatible with the workbench.

I would be hesitant to include the step_size.

Decoupling sampling strategies from the parameter ranges seems to be a good idea indeed.

adamamer20 commented 2 months ago

This is a great idea! For reference, Optuna (a hyperparameter optimization framework) uses a similar approach for defining parameter search spaces. See an example here: Optuna Pythonic Search Space What if we allowed SciPy distributions for numerical parameters? This could be particularly beneficial when running batch_run. You could get more informative results if you don't do a full sweep search and instead specify a number of runs, especially if you know certain parameters follow a specific distribution.

EwoutH commented 2 months ago

Okay, I thought about this a bit more.

Basically we have the four main for data types:

Continuous values (return: float)
Discrete values (return: int)
Ordered sets (return: Any)
Categorical (unordered) sets (return: Any)

You could add more details with Stevens's typology of level of measurement, but (for now) I don't think that's necessary. Boolean is a bit weird, because it can basically be a special case of either ordered or categorial data.

So what's the bare minimum you need to sample each?

Categorical (unordered) sets

categories
probabilities

Ordered sets

categories (sorted)
probabilities

Taking these two, we can already observe they align quite well. They can probably be one class, with an boolean attribute for ordered.

Discrete (interval) values

Range (min and max)
Interval
Distribution type
- Distribution parameters

Continuous values (return: `float`)

Range (min and max)
Distribution type
- Distribution parameters

These two also are similar, and probably can be grouped in a class. That means we would have something like a NumericalParameters and CategoricalParameters subclass.

@dataclass
class ParameterSpec:
    """Base class for parameter specifications."""
    description: str = ""

@dataclass
class NumParamSpec(ParameterSpec):
    """Represents numerical parameters, including continuous and discrete types."""
    min_val: float | None = None
    max_val: float | None = None
    is_discrete: bool = False  # False means continuous (sampler should return float), True means discrete (should return int)
    distribution_type: str | None = None  # Optional field to specify the distribution type
    distribution_parameters: dict | None = None  # Parameters for the distribution (e.g., mean, std)

@dataclass
class CatParamSpec(ParameterSpec):
    """Represents categorical parameters, both ordered and unordered sets."""
    categories: List[Any]
    probabilities: List[float] | None = None  # Probabilities for each category, if any
    is_ordered: bool = False  # False means unordered, True means ordered

So, I think this satisfices sampling. Note that distribution_parameters could contain things like loc, scale and shape, or other variables necessary.

Then there's the "practical modelling" size. I see basically three important use cases here:

A default value would be really useful for model development, keeping stuff reproducible in the beginning. There are 2.5 ways a sampler could handle this:
- Implicit: There's some convention about what's the default value. The first value in a list, the middle of the range, etc.
- Explicit: There has to be a default defined.
- Hybrid: Implicit if no default is passed, explicit if it is. Since samplers might want to decide this for themselves, I think a default key could be useful in all cases.
For numerical parameters, when sampling from a distribution, you might either want to do a hard clip using the min and max value, or allow then outside this range. This could be a boolean.
For numerical parameters, to visualize them properly, something like bins, bin_size step_size might be useful to easily create sliders (for input) and plots (for output). Knowing if the parameters is meant to scale linear, logarithmic, exponential or otherwise might also be useful to know.
Booleans can be super convenient and thus should be supported also in some way. Maybe as an separate class.

My thoughts so far. Curious what everybody thinks!

EwoutH commented 2 months ago

Some resources linked from @tupui and @ConnectedSystems over in https://github.com/SALib/SALib/issues/634:

ModelParameters.jl (Julia)
parameters (R)

Might be interesting if we can learn something from them!

EwoutH commented 1 month ago

Okay, to move forward:

Centralize discussion to one place. discuss.scientific-python.org might be the most fitting, but GitHub might be more visible.
- Maybe set up a call schedule or something
Get an initial set of requirements, using the insights from various libraries
- If needed at this stage, get more maintainers / core developers of libraries involved
Get consensus on a conceptual-level solution
Come up with implementation solution (how/where, API, etc.)
Roll out and start testing
Iterate
(optionally) make it a formal SPEC

What does everybody think? What am I missing or should be different?

CC @tupui and @ConnectedSystems

EwoutH commented 1 month ago

Let's centralize the discussion to Scientific Python, so we can get all ideas in one place:

https://discuss.scientific-python.org/t/standardized-system-for-parameter-management-across-scientific-python-libraries/1422

I will make a little introduction there with my thoughts on the problem from the perspective of Mesa. I would love the same from other maintainers from other libraries!

Corvince commented 1 month ago

@EwoutH Since I saw you are working on a new batch-runner, let me just very quickly outline some of the possibilities with param. So first of all, this is how you could specifiy parameters with param

class MyModel:
     n = param.Integer(10, bounds=(0, None), soft_bounds=(0, 100), step=5, help="Number of agents")

That is we define the number of agents with a default value (10), some hard bounds (must be positive, raises an exception if not), some soft bounds (should be betwenn 0 and 100, e.g. can be picked up by a slider), some step size (not enforced, but can be used for parameter sweeps or again a GUI), and a short help text.

Now the interesting part relating to batch running is that besides setting this value directly (model.n = 50), we can also set it to a function that resolves to an integer (model.n = random.randint(0, 100)). This of course allows to easily do parameter sweeps. They even provide a whole ranger of numbergen functions that sample from different distributions. So that might be worth considering

https://param.holoviz.org/user_guide/Dynamic_Parameters.html

EwoutH commented 1 month ago

Thanks, I think param is almost exactly what this issue was intended to produce. Turns out it already exists, which saves so much time.

I’m going to try to integrate it and see how it works.

EwoutH commented 3 weeks ago

Python has a typing.ParamSpec class, which as of Python 3.13 now can also have a default value. Might be interesting to look further into, and maybe subclass.

projectmesa / mesa