Batch run: DataCollector parameters as timeseries

philippschw commented 8 years ago

Thank you for your great efforts to develop a intuitive framework for agent-based modeling in Python. A few days ago, I discovered the mesa project and I since then experimenting with it, trying to find out whether if is useful for my own research.

My question for this forum is summarized in the title. I would like to utilize the Batch run and instead of saving only the final output of the parameters I am interested to store the complete timeseries in an pandas DataFrame. In the documentation I read that: "To get step by step data, simply have a reporter store the model’s entire DataCollector object." I tried several different syntaxis but did not manage to do: If I just code inside the Reporter dictionary "DataCollector": DataCollector, I get an AttributeError: ...object has no attribute 'items'

I would very much appreciate if you could make a small coding example maybe in one of the existing Ipython notebooks for the forest fire model how to do it.

Thanks for you reply!

dmasad commented 8 years ago

That's a solid suggestion, and there should be an example for that. Just collecting data collectors seems like the intuitive way to do it now, but it should also be possible to modify the BatchRunner class to collect that internally.

In the mean time I'll try to take a look and put together an example / see if we can find the bug.

philippschw commented 8 years ago

Thank you David! I am not sure whether that is a Bug or just my defective programming of the model but the DataCollector seems to start collecting data only from tick one and does not record the initial state.

I built my own simple SIR-Modell (Susceptible-Infected-Recovered-Model) based on the fire model in the example folder. If you are interested I could also share it.

The reason I am using mesa is that I aim to built a hybrid System Dynamics - Agent Based Model. Mesa seems to be a great fit for the agent-based component.

jackiekazil commented 8 years ago

@philippschw PLEASE do add your model! We love to have more submissions. Can you make sure that your model as a read me as well so people understand what they are looking at?

philippschw commented 8 years ago

Regarding the DataCollector: I can record the initial state if I change the order in the time advance function of the model. def step(self): self.datacollector.collect(self) self.schedule.step() But as a result I miss the last value of the simulation. In my opinion, and according to my experience with modeling and simulation, If the simulation time is 10, the outcome pandas dataframe should have 11 rows.

@jackiekazil I will share the model, once it is a little more interesting. Since I am using Python 2.7 and do not want to change. I am also working on making mesa backward compatible with older versions of Python.

xiaofanliang commented 5 years ago

I have the exact same questions as philippschw. How do I record time series data with BatchRunner? I also tried multiple syntax according to this sentence "To get step by step data, simply have a reporter store the model’s entire DataCollector object", but none works so far. Can anyone give an example of how to embed the model's entire DataCollector object in BatchRunner?

xiaofanliang commented 5 years ago

Here I found an example that has step by step Batch Runner: https://github.com/projectmesa/mesa/blob/master/examples/bank_reserves/batch_run.py

jackiekazil commented 5 years ago

@xiaofanliang I am don't believe this was ever addressed. @dmasad, am I wrong?

AndrewC19 commented 4 years ago

I have this question too. I can't find any documentation for a step-by-step data collector. I would like to be able to run my model with the batch runner and record the value of reported variables at each time-step rather than at the end of the model.

AndrewC19 commented 4 years ago

I managed to modify the example at https://github.com/projectmesa/mesa/blob/master/examples/bank_reserves/batch_run.py suggested by @xiaofanliang for use in my own model.

A continuous/time-series datacollector would still be a helpful addition as it took some digging to find a solution to this.

tpike3 commented 4 years ago

@AndrewC19 somehow the sphinx/readthedocs build got corrupted but in the tutorial it shows how to get the agent data by step for batch run.

BatchRun is the last section in the tutorial.

jackiekazil commented 4 years ago

@tpike3 it has been a long time since I generated docs. When you make updates to the tutorial notebooks you have to reexport them to rst I think or there is a sphinx command. we should probably figure that out and document it in the wiki -- https://github.com/projectmesa/mesa/wiki

tpike3 commented 3 years ago

Closing based on Mesa update 0.8.8.1, Nov 27, 2020 and other issue for batchrunner docs update #948

toohuman commented 3 years ago

Hello,

I'm still having trouble understanding how to get the data collectors from individual runs and merge them with the model vars dataframe so that the time series data is associated with the batch's (fixed and variable) parameters.

From #984 the following sentence is still quite vague in the BatchRunner docs on the master branch:

Note that by default, the reporters only collect data at the end of the run. To get step by step data, simply have a reporter store the model's entire DataCollector object.

tpike3 commented 3 years ago

@toohuman Just to confirm you are able to get the model data (so macro information) and the agent collectors (micro actions of agents agents at each time step) and then you want to merge them?

Tom

toohuman commented 3 years ago

Hi @tpike3, pretty much yes. I am actually trying to get the model collectors and merge those with the marco-level model data, as the model collectors don't record all of the parameters used for that particular batch. I'd like to retrieve the model collectors' dataframes and merge them with each batch's macro info so that I get a full series of runtime data associated with the variables set for each batch.

I hope that makes sense, thanks. I'm still trying to modify the bank_reserves example, as I realised I had an older version of batchrunner.py and I'm hoping this is possible.

adrien-perello commented 2 years ago

In case it would interest some, I wrote a script to collect the entire time series for each variables in model_reporters (which I also mentioned in #865)

Final result: an xarray

I'm using the boltzmann wealth model (with an additional dummy variable).

The result looks like this:

Screenshot from 2021-11-10 14-57-10

where

the coordinates represent the variable parameters
- here N (the number of agents)
- number of iterations (Run)
- each tick / time step (Step)
Data variables represent the content of model_reporters
The attributes represent the fixed parameters
- here width and height

Script

First, in BatchRunner, I collect the data as dictionnaries

batch_run = BatchRunner(
    BoltzmannWealthModel, 
    variable_parameters = {"N": range(10, 500, 10)},
    fixed_parameters = {"width": 10, "height": 10},
    iterations = 5,
    max_steps = 100,
    model_reporters={
        "datacollector": lambda model: model.datacollector.get_model_vars_dataframe().to_dict()
    }
)

And then, I use the following functions:

def generate_df(batch_run):
    # get dataframe from batch_run
    run_data = batch_run.get_model_vars_dataframe()
    # drop fixed parameters (unecessary)
    fixed_params_idx = list(batch_run.fixed_parameters.keys())
    run_data.drop(fixed_params_idx, axis=1, inplace=True)
    # convert 'Run' number (=unique id) to iteration number
    # (for use as coordinate of future xarray)
    run_data["Run"] = run_data["Run"] % batch_run.iterations
    # keep track of columns that will become indexes
    # (= all but datacollectors)
    indexes = list(run_data.drop("datacollectors", axis=1).columns)
    # unnest data contained in datacollectors into new columns
    run_data = unnest_records(run_data, "datacollectors")
    # keep track of new columns indexes (= variable names)
    variables = list(set(run_data.columns) - set(indexes))
    # reset index
    run_data.set_index(indexes, inplace=True, drop=True)
    # unnest values inside new columns (= dict) 
    return unnest_columns(run_data, variables)

def generate_xr(batch_run):
    # generate cleaned dataframe
    df = generate_df(batch_run)
    #convert df to xarray
    da = df.to_xarray()
    # set fixed params as attributes rather than coords
    da.attrs = batch_run.fixed_parameters
    return da

also, generate_df use the following 2 functions:

def unnest_records(df, column):
    df_expanded = pd.DataFrame.from_records(df[column])
    df_final = pd.concat([df, df_expanded], axis=1)
    return df_final.drop(column, axis=1)

def unnest_columns(df, columns):
    return (pd.concat(
            [pd.DataFrame(df[x].tolist(), index=df.index)
                for x in columns],
            axis=1,
            keys=columns,
        )
        .stack()
        .rename_axis(index={None: "Step"})
    )

Limits

The process is relatively slow. Reading from #798 I suspect that it has to do with converting the datacollectors to a dataframe at each step, but I have not profiled my code to check.

It might be faster to only collect the datacollectors (without converting them to dataframes), but then I got stuck in unnesting them (I'm still a bit of a newbie with pandas/xarray).

Also, here is an example where a batch of simulations was run manually (i.e. not using batch_runner). For 1000 steps, it is about 25% faster than my solution (but to be fair, there is only one variable parameter, which makes it easier to run a batch of simulations manually)

tpike3 commented 2 years ago

Both the model and agent collectors store the data in a dictionary format with the key being that iterations parameters in a tuple.

So if you had a model that had variable population [10,50,100] but a standard grid size [10,10] then the keys for the outputs would be {(10,10,10): <results>, (50,10,10): <results>, (100,10,10):<results>} this would be the same for the agents just agent specific results.

Does this help?

adrien-perello commented 2 years ago

Thanks @tpike3,

Here is the modified batch_run following your comment (which speeds up the process a tiny bit):

batch_run = BatchRunner(
    BoltzmannWealthModel, 
    variable_parameters = {"N": range(10, 500, 10)},
    fixed_parameters = {"width": 10, "height": 10},
    iterations = 5,
    max_steps = 100,
    model_reporters={
        "datacollectors": lambda model: model.datacollector.model_vars
    }
)

But I have been wondering, is it possible to unpack the key,values pair from model.datacollector in model_reporters instead of having to unnest them later? To be clear, instead of having a (single) column datacollectors in my batch_run, I am trying to have a column for each variable contained in model.datacollector, but I can't find a way to tell my batch_run.model_reporters what columns it is expected to get until I unpack my model (The BoltzmannWealthModel does not have a class attribute datacollector). Any suggestion?

tpike3 commented 2 years ago

@adrien-perello Thanks for the example!

As to your question @Corvince rewrote a much shorter and clearer version of batchrunner

His datacollector overview is here, I think that will meet the mail for what you are asking.

@jackiekazil That pull request is beautiful and ready to go whenever you get a chance to doublecheck and merge

adrien-perello commented 2 years ago

One caveat though: the run / iteration number of each experiment is messed up when using BatchRunnerMP.

I think this issue has been mentioned in #881 and #1003

def generate_df(batch_run):
    .
    .
    .
    # convert 'Run' number (=unique id) to iteration number
    # (for use as coordinate of future xarray)
    run_data["Run"] = run_data["Run"] % batch_run.iterations  # this line don't work with BatchRunnerMP !!!
    .
    .
    .
    return unnest_columns(run_data, variables)

And thanks for redirecting me towards the code of @Corvince. I had not seen that. Is there any plan on merging the NewBatchRunner branch to main some time soon?

projectmesa / mesa