tapios / risk-networks

Code for risk networks: a blend of compartmental models, graphs, data assimilation and semi-supervised learning
Other
2 stars 2 forks source link

Implement ContactNetwork and refactor accordingly #136

Closed dburov190 closed 4 years ago

dburov190 commented 4 years ago

I'm finally done with extracting & implementing ContactNetwork, and refactoring other parts of the code accordingly (so that a nx.Graph object is not passed around).

This is a huge PR, and I apologize for that, but it's an atomic thing, there was no reasonable way to split this into several PRs. I tried to minimize edits as much as possible, but there's still a lot to review.

Therefore, I'd probably suggest partial review by everyone to reduce the effort. If you could all please have a look at your individual parts and see if everything is OK, it would be great. (@odunbar could you maybe have a look at everything? at least briefly? thank you!)

In the current state, this breaks almost all the examples, but I'll update those as soon as possible. I have one working example right now, examples/papers/nyc_contact_network_da_interventions.py which is based on examples/papers/nyc_da_interventions.py, so by comparing two files side-by-side you can track the needed changes (but again, I'll update the rest as soon as possible).

I'm sorry for such a huge change at once, but on the upside, once we're done merging this, I think the code will become much more transparent. There are still some rough edges (hehe, pun intended) but overall I feel like the hardest part is done.

PS Actually, if you could all list 1-2 files that you need urgently, that would be great. I'd prioritize those then.

odunbar commented 4 years ago

Also note there are still the classes in epidemic_data_storage.py and user_base.py s which will interact with the contact network and are not on the PR. A little more so once the exogenous rates are merged

dburov190 commented 4 years ago

Also note there are still the classes in epidemic_data_storage.py and user_base.py s which will interact with the contact network and are not on the PR. A little more so once the exogenous rates are merged

@odunbar I've actually checked, and I think the way storage is implemented it should work without any changes (or without almost any)! (: the user_base.py does need a bit of work, but it's minimal. So this part is relatively easy.

Thanks for other comments! will address after the meeting. Yeah, you're right, I'll put a side-by-side comparison here with how it used to be and how it should be done now, it'll help get familiar with the new stuff.

Re: deferring, I disagree because then this will never get merged (: because we have such a high pace of development with absolutely no tests, the code rots very quickly. What this essentially means, with a big structural change like this one, every new feature will have to be implemented twice: once in the old framework, and once in the new. And that becomes a race, basically: whether I can keep up with everyone else and translate all new features into this framework.

I've seen many times how monster PRs would just sit and rot for days and even weeks before finally becoming so inconsistent with the ongoing work that they essentially would have to be either refactored again or just thrown away altogether. So I'd propose to merge this as soon as we can (without harming the workflow, of course), for example, on Monday, when everyone gets a chance to review and I translate several urgent examples. Otherwise it'll just get stuck here forever.

Again, I'm sorry for such a huge piece, but there wasn't really a way to split this into several chunks ):

dburov190 commented 4 years ago

@odunbar Ollie, re: epidemic_data_storage & user_base: I think the former would work just fine with the new class (we simply need to adjust examples with the data storage).

The latter is a bit less straightforward. I'll incorporate it into the framework tomorrow

odunbar commented 4 years ago

I'll just note here, that the change from functools.singledispatch to functools.singledispatchmethod broke the code for me from master.

@dburov190 Does it provide necessary feaures to the ContactNetwork which weren't available with functools.singledispatch or was just incompatible with your current python version?

My recommendation is we keep the original functools.singledispatch method in accordance with the package requirements of python=3.6.10. (and which has been shown to work). Right now you can make an issue and in a future PR this can be changed along with adjustment to the risknet.yml file. It feels like this is an additional change that is unrelated to ContactNetwork

odunbar commented 4 years ago

If we don't need the TransitionRates then we should remove them from ContactNetwork. If we do need them set then I think we should make their purpose more explicit.

This is a source of confusion - as it confused me ;) in the following:

The contact network is be used for both kinetic and master equations, BUT the TransitionRates object stored in ContactNetwork are not. Master equations however still has it's own set of TransitionRates but it must ignore the ones set on the network. This of course was present before, however it was not as explicit.

A quickfix would be to rename the methods/ and stored TransitionRates in the contact network to something more explicit.

A better fix would be to move the TransitionRates and diagrams into the kinetic model. This would give more consistency for the kinetic and risk simulators frameworks. It would also give more modularity to the classes, as the diagrams and rates are used only in the kinetic model and nowhere else, so why are they stored in the ContactNetwork object for all time? But we can definitely discuss this.

dburov190 commented 4 years ago

Well, yes... I agree, this is confusing and ideally should be separated. But as you correctly said, this keeps getting bigger and bigger, so maybe we can leave this for later? 😅 There is essentially another piece of refactoring needed for that. I actually marked diagram generation methods in the code as not belonging to ContactNetwork class, but just didn't want to deal with it at this stage.

rates are used only in the kinetic model and nowhere else

While this is true for transition rates, you'd still have the same problem with edge weights (which are set on the network), since master equation ensemble needs them, and it gets them from the network:

master_eqn_ensemble.set_mean_contact_duration(network.get_edge_weights())

(which are set by epidemic simulator)

So, essentially, network acts as a mediator between contact simulator and master equations. And this workaround, again, would not exist in the first place if we didn't have EpidemicSimulator. But alas.

This also has to do with EoN and the way their simulator is written (i.e. when you pass nx.Graph object to it, it has to have all the correct node and edge attributes). To accommodate for that, we could implement something similar to what I wrote for user base builder: just pass an argument to a method of ContactNetwork that sets the correct weights etc. on a copy of self.graph and pass the created object back. But again, do we really wanna do it within this PR? (:

Another solution for transition rates would be to actually read them in the master equation ensemble, I mean those that are shared (as you pointed out somewhere earlier, to avoid model error). And explicitly pass (to the ensemble) distributions for those that should be inferred (and let the ensemble object do the drawing). The benefit of doing it this way is that you would automatically avoid inconsistencies like the one I had: where you forget to set the true values (or the "true" values are actually not true) for non-inferred rates. But again, this is too much to deal with for now...

dburov190 commented 4 years ago

@odunbar Ollie, why did you rename all the constants to lower-case? this is against the guidelines, please change them back PS this is what I was referring to: https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python (well, and PEP-8)

dburov190 commented 4 years ago

This PR fixes #126

odunbar commented 4 years ago

In the absence of Contact simulator, (e.g for a static network) we still create a contact network object, therefore I have added a default for contact_duration or WJI. As the master equations still calls the ContactNetwork class getter for these before they are set. In effect we explicitly set edge_attribute weights to 1.0, where before it was done implicitly.

Are there other cases where we should set default attributes on the network?