Dataset parsing and graph creation does not scale with experimental complexity

nkremerh commented 7 months ago

We are beginning to use the software to investigate research questions which require more complex experimentation. The current software pipeline allows for easily investigating the impact of different decision models across six key metrics across time.

However, what if we wanted to gather data on a combination of [list of decision models] + [list of metabolisms] + [list of trade factors], etc? The pipeline cannot accommodate that without herculean changes to plots/plot.py. Further, what if we wanted to present data not across time but across different variables (such as tradeFactor or selfishnessFactor)? Again, the plotting script would require radical revisions.

Combine new data collection combinations and new ways to present them, and we end up with a spaghetti code mess. What's the preferable option? Provide a querying interface for the dataset and a middleware layer which can take the output of queries and plot them as graphs. The goal would be to use strictly base installation Python modules and software which is regularly available on most Linux distributions. We want to minimize our dependency footprint.

Python has a provided sqlite3 module, and SQLite does support JSON. There's a query language. This is a broader conversation, but we are already rapidly approaching a reasonable upper bound on complexity for our plotting script before it becomes unreasonable to expect new users to contribute to it in the future.

colinhanrahan commented 7 months ago

I haven't looked into sqlite3 yet, but I was running into the issue of not enough customization for data collection when working on my multiple decision models branch.

I'd like to propose a new structure for data collection where config.json splits into 2 files:

config.json
- for normal, single simulation use
- stores "sugarscapeOptions" only, does not store "dataCollectionOptions"
data_collection_config.json
- built for data collection
- stores "dataCollectionOptions" at top of file
- "dataCollectionOptions" now stores a new "pathToDefaultConfig" variable which, by default, is set to "config.json"
- below that, stores individual JSON objects representing simulations. You only include the settings you want to change; the rest will automatically be set to the settings from "pathToDefaultConfig"
- there will be 3 JSON objects below by default, one for each simulation of the current default setup for data collection we have now. But new JSON objects can be added underneath to add more simulations for data collection.

Here's a short example:

{
    "__README__": "Default values for data collection. Details can be found in the README.",
    "dataCollectionOptions": {
        "bashAlias": "bash",
        "jobUpdateFrequency": 5,
        "numParallelSimJobs": 1,
        "numSeeds": 100,
        "pathToDefaultConfig": "config.json",
        "plots": ["population", "meanttl", "starvationCombat", "wealth", "wealthNormalized", "meanAgeAtDeath"],
        "plotTimesteps": 1000,
        "pythonAlias": "python"
    },
    "simulation1": {
        "agentDecisionModel": "benthamHalfLookaheadBinary",
    },
    "simulation2": {
        "agentDecisionModel": "benthamHalfLookaheadBinary",
        "agentMovementMode": "radial",
        "agentVisionMode": "radial",
    },
    "simulation3": {
        "agentDecisionModel": "egoisticHalfLookaheadBinary",
    },
    "simulation4": {
        "agentDecisionModel": "egoisticHalfLookaheadBinary",
        "agentMovementMode": "radial",
        "agentVisionMode": "radial",
    },
}

By the way, while working on the multiple decision models branch, I condensed the plotting script a lot and removed the hardcoding for the three default data collection simulations. It should be on my multiple-decision-models-patch branch if you want to take a look at it, but it's mixed in with the other changes I've made to accommodate multiple decision models in the same simulation.

nkremerh commented 7 months ago

There's something here. I'd prefer to keep the configuration file in one place (so users always know where to go to change options).

Let me think on adding individual JSON objects for individual experimental setups before you go too much further with this.

nkremerh commented 3 months ago

Largely resolved by #95 as it is now easy to add a single line of code to produce a brand new graph.

nkremerh / sugarscape

Dataset parsing and graph creation does not scale with experimental complexity #46