Closed nkremerh closed 3 months ago
I haven't looked into sqlite3
yet, but I was running into the issue of not enough customization for data collection when working on my multiple decision models branch.
I'd like to propose a new structure for data collection where config.json
splits into 2 files:
config.json
"sugarscapeOptions"
only, does not store "dataCollectionOptions"
data_collection_config.json
"dataCollectionOptions"
at top of file"dataCollectionOptions"
now stores a new "pathToDefaultConfig"
variable which, by default, is set to "config.json"
"pathToDefaultConfig"
Here's a short example:
{
"__README__": "Default values for data collection. Details can be found in the README.",
"dataCollectionOptions": {
"bashAlias": "bash",
"jobUpdateFrequency": 5,
"numParallelSimJobs": 1,
"numSeeds": 100,
"pathToDefaultConfig": "config.json",
"plots": ["population", "meanttl", "starvationCombat", "wealth", "wealthNormalized", "meanAgeAtDeath"],
"plotTimesteps": 1000,
"pythonAlias": "python"
},
"simulation1": {
"agentDecisionModel": "benthamHalfLookaheadBinary",
},
"simulation2": {
"agentDecisionModel": "benthamHalfLookaheadBinary",
"agentMovementMode": "radial",
"agentVisionMode": "radial",
},
"simulation3": {
"agentDecisionModel": "egoisticHalfLookaheadBinary",
},
"simulation4": {
"agentDecisionModel": "egoisticHalfLookaheadBinary",
"agentMovementMode": "radial",
"agentVisionMode": "radial",
},
}
By the way, while working on the multiple decision models branch, I condensed the plotting script a lot and removed the hardcoding for the three default data collection simulations. It should be on my multiple-decision-models-patch
branch if you want to take a look at it, but it's mixed in with the other changes I've made to accommodate multiple decision models in the same simulation.
There's something here. I'd prefer to keep the configuration file in one place (so users always know where to go to change options).
Let me think on adding individual JSON objects for individual experimental setups before you go too much further with this.
Largely resolved by #95 as it is now easy to add a single line of code to produce a brand new graph.
We are beginning to use the software to investigate research questions which require more complex experimentation. The current software pipeline allows for easily investigating the impact of different decision models across six key metrics across time.
However, what if we wanted to gather data on a combination of
[list of decision models] + [list of metabolisms] + [list of trade factors]
, etc? The pipeline cannot accommodate that without herculean changes toplots/plot.py
. Further, what if we wanted to present data not across time but across different variables (such astradeFactor
orselfishnessFactor
)? Again, the plotting script would require radical revisions.Combine new data collection combinations and new ways to present them, and we end up with a spaghetti code mess. What's the preferable option? Provide a querying interface for the dataset and a middleware layer which can take the output of queries and plot them as graphs. The goal would be to use strictly base installation Python modules and software which is regularly available on most Linux distributions. We want to minimize our dependency footprint.
Python has a provided
sqlite3
module, and SQLite does support JSON. There's a query language. This is a broader conversation, but we are already rapidly approaching a reasonable upper bound on complexity for our plotting script before it becomes unreasonable to expect new users to contribute to it in the future.