Adds real metrics for diseases to the log

amuller26 commented 1 month ago

Adds metrics for all diseases logged individually. Disease stats are currently stored in a separate log file, which can be either a CSV file or JSON file.

amuller26 commented 1 month ago

I ran into an issue where if you change the number of starting diseases per agent, in sugarscape.py, each disease is added multiple times. The number of times it's added is not correlated with the number of starting diseases per agent either.

colinhanrahan commented 1 month ago

The intended behavior is that startingDiseases is the master pool of diseases and startingDiseasesPerAgent is how many diseases each agent will be endowed with at the beginning of the simulation. So if there are 25 diseases and 10 diseases per agent, each agent should catch a random 10 of the 25 diseases (they will not catch diseases they are immune to). Can you send me a config where you're seeing incorrect behavior?

amuller26 commented 1 month ago

@colinhanrahan Everything in config.json is the same except startingDiseasesPerAgent is anything but [0, 0].

colinhanrahan commented 1 month ago

Everything I'm seeing with "startingDiseasesPerAgent" != [0, 0] matches up with pg. 147 in the book. Can you elaborate a little more on what's happening?

amuller26 commented 1 month ago

@colinhanrahan sugarscape.py, line 178-201

diseases = [] for i in range(numDiseases): diseaseID = self.generateDiseaseID() diseaseConfiguration = diseaseEndowments[i] newDisease = disease.Disease(diseaseID, diseaseConfiguration) diseases.append(newDisease) startingDiseases = self.configuration["startingDiseasesPerAgent"] minStartingDiseases = startingDiseases[0] maxStartingDiseases = startingDiseases[1] currStartingDiseases = minStartingDiseases for agent in self.agents: random.shuffle(diseases) for newDisease in diseases: if len(agent.diseases) >= currStartingDiseases and startingDiseases != [0, 0]: currStartingDiseases += 1 break hammingDistance = agent.findNearestHammingDistanceInDisease(newDisease)["distance"] if hammingDistance == 0: continue agent.catchDisease(newDisease) self.diseases.append(newDisease) if startingDiseases == [0, 0]: diseases.remove(newDisease) break

The disease loop is in the agent loop, so if an agent gets assigned the same disease as another agent, a duplicate is appended to self.diseases

colinhanrahan commented 1 month ago

Oh, I see. You can either put that in a conditional and only add the disease if it's not already in self.diseases or set self.diseases = diseases after the for i in range(numDiseases) loop finishes. The disadvantage with the second approach is that there's a small chance a disease in the self.diseases list will be completely absent from the population.

For more built-in stability, we could use a set since diseases are unique and unordered.

amuller26 commented 1 month ago

I'm not sure if this is an error, but when configuration['startingDiseasesPerAgent'] != [0,0], it isn't guaranteed that all diseases will get used. Let's say configuration["startingDiseases"] = 50, self.diseases might only use 45. Is that normal or to be expected?

colinhanrahan commented 1 month ago

It's not an error with the current implementation, but we could change the implementation if necessary. We endow each agent with startingDiseasesPerAgent random diseases that they are not immune to from the master list of diseases, so there are two causes for unused diseases:

all agents are immune to the disease, so they do not contract it
some or all agents are not immune to the disease, but the disease is never selected by chance

amuller26 commented 1 month ago

I consolidated the diseases' metrics so far into the logfile. The naming scheme so far is runtimeStats["disease{disease.ID}{metric}"]. The only metric so far is the R-Value, waiting on Dr. R's response for which other metrics to track. With the current naming scheme, in the JSON format, it will show up disease0, disease1, disease2, disease3, etc. However, with the CSV format, it lists them disease0, disease1, disease10, disease11, etc. Is there a better way to name the disease metrics?

colinhanrahan commented 1 month ago

Yeah, it looks like CSV variables are sorted alphabetically (agentAgingDeaths,agentCombatDeaths,agentDiseaseDeaths) while JSON variables are intentionally ordered ("timestep": 0, "population": 250, "meanMetabolism": 2.49). There are two approaches you could take:

Add the disease stats after the runtime stats are sorted. Disease stats are added on lines 64-65 of __init__:
```
diseaseStats = {f"disease{disease.ID}RValue": 0 for disease in self.diseases}
self.runtimeStats.update(diseaseStats)
```
And runtime stats are sorted alphabetically in startLog and endLog. You could just add the disease stats after those sorts instead of in __init__.
When naming disease variables, pack the disease IDs with 0s on the left to maintain alphabetical order. So if there were 11-100 diseases, you would add one extra 0 to diseases 0-9: disease00, disease01, disease02....

+3. Maybe the runtime stats don't need to be sorted at all — you could ask NKH when he becomes available again. They should be in a consistent order regardless so I don't understand the comment # Ensure consistent ordering for CSV format.

Edit: the above comment might have been because dictionaries didn't officially maintain insertion order until Python 3.7. If that is the case, we should be good to remove the sorting.

amuller26 commented 1 month ago

I started implemented my feature to start diseases at different set timesteps. However, if the diseaseStartTimeframe, it either won't initialize the rest of the diseases (if diseaseStartTimeframe = [0, 1] for example]. Or, if it starts at any timestep past 0, it will only initialize half of the diseases it's supposed to (diseaseStartTimeframe = [1, 1] for example). I am not sure where I am going wrong. It's reading the infectAgents()function and looks like it's doing everything right, but maybe I've been staring at it too long. Some suggestions and advice would be great. I have everything logged to a separate file so I can look at the numbers and make sure it's correct. Once I get this sorted I will add all of the data to the main log.

amuller26 commented 1 month ago

I'm also not sure where to put the infectAgents() function because the infected agents don't change colors until the day after the disease is initialized.

colinhanrahan commented 1 month ago

How is the start timestep per disease intented to work with startingDiseasesPerAgent? If "diseaseStartTimeframe": [0, 1], for example, does every agent get infected with a value in the range startingDiseasesPerAgent on timestep 0 (selected from the pool of diseases with that starting timestep — so half the total pool), and then the same on timestep 1? Or are the settings mutually exclusive?

I'm also seeing some zombie agents that don't move but don't die — I'll investigate this, but these are always caused by agents being removed from sugarscape.agents before calling their own death function.

amuller26 commented 1 month ago

"diseaseStartTimeframe": [0, 1] works similarly to the pollutionTimeframe. The disease is assigned a timestep to be initialized and infect however many agents. Agents will still be infected with however many diseases depending on startingDiseasesPerAgent but they might not all be at the same timestep. If "startingDiseasePerAgent: [0, 5] and "diseaseStartTimeframe": [0, 5], then an agent could be infected by up to 5 diseases within timesteps 0-5.

The disease is assigned a randomized timestep and then entered in a 2D array called diseasesCount where infectAgents() will use only diseasesCount[timestep.

colinhanrahan commented 1 month ago

self.infectAgents should be moved to the top of self.doTimestep, probably after removing and replacing agents — stepping forward on the GUI is not part of the runSimulation loop and calls sugarscape.doTimestep directly.

Edit: Move it to after self.timestep is incremented so that the timestep is correct. This should fix the GUI drawing issue. I recommend renaming diseasesCount something like newDiseasesPerTimestep for clarity.

amuller26 commented 1 month ago

I moved it to after self.doTimestep and something's still wrong. It looks like the diseases were initialized on the correct timestep but the GUI didn't recognize them and the stats are logged the timestep after. It also looks like the timestep after is day 2 of the disease in the simulation.

colinhanrahan commented 1 month ago

Move it into doTimestep here:

        self.timestep += 1
        self.infectAgents()
        if self.end == True or (len(self.agents) == 0 and self.keepAlive == False):
            self.toggleEnd()

That seems to work pretty well on my side.

colinhanrahan commented 1 month ago

I saw that you added another comment through my notifications, but I can't see it right now. If you deleted it please disregard.

The diseases are introduced at the beginning of the timestep. If you're checking the diseases per agent at the end of the timestep (or after any agents have completed their their timestep), it's likely that the diseases will have spread ("duplicated").

If "startingDiseasesPerAgent" != [0, 0], then new diseases can infect multiple agents at the same time. Otherwise, they should only infect one agent when they are introduced into the population.

Do either of those fix your issue? If not, can you give me some more detail?

amuller26 commented 3 weeks ago

There are some agents who do not move, and I don't know if it's because they can't move or they're zombies.

colinhanrahan commented 3 weeks ago

Zombie agents will have negative sugar and metabolism and their age won't increase over time. Sick agents with 0 functional range will have 0 vision and/or 0 movement.

colinhanrahan commented 3 weeks ago

Add diseaselog.csv and diseaselog.json to the make clean command:

LOGS = log.csv log.json diseaselog.csv diseaselog.json

amuller26 commented 2 weeks ago

I modified agent behavior to avoid any cell with sick agents in the vicinity when they're not sick, and to avoid only areas with the same tribe if the agent is sick. @colinhanrahan Can you please check that it is implemented correctly?

colinhanrahan commented 2 weeks ago

Running disease_basic.json (after fixing the breaks on my two active PRs), agents refuse to move next to any agent with a disease and end up in a gridlock where many can't move at all. This seems like a natural consequence of the logic you introduced rather than a bug. Thoughts?

Edit: for cells directly adjacent to the agent's current cell, both checkInfectedArea and checkTribeArea are triggered by the self agent, so they'll never move 1 cell when they're sick. You should exclude self from these checks.

nkremerh commented 2 weeks ago

A thought for the self-quarantining and the exiling behavior:

These should be configurable parameters for the agent (such as self.quarantineFactor) which impact the scores for potential cells. A considerate sick agent (i.e. one with a quarantineFactor of 1) will impose strong score penalties to cells next to others. Likewise, a healthy agent with a high quarantineFactor will impose strong score penalties to cells next to sick agents.

That way, there's less likely to be gridlock. Agents may not find too many cells with good scores, but they're still likely to move since those cells next to sick/healthy individuals aren't completely removed from consideration. They're simply less appealing.

nkremerh / sugarscape

Adds real metrics for diseases to the log #115