mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

have any of the bugs significantly impacted the model results? #274

Closed ghost closed 4 years ago

ghost commented 4 years ago

Bugs in academic code aren't really suprising - the important thing is - have fixing the bugs drastically altered the outcomes or have they had negligible effects?

robscovell-ts commented 4 years ago

@LetterRip I think it is too soon to tell. One very public bug has been fixed: the issue raised by the Edinburgh University peer group around non-determinacy. That does not seem to have affected the outcomes, judging by comments from the ICL team here, because it affects only initialisation of the model and not the actual running of it. There are other non-determinate effects of paralellism, due to threads completing in different times on different runs, but the resulting states are statistically equivalent.

There is one contributor here who is working on a serious architectural refactoring of a fork of the code: https://github.com/Feynstein/covid-sim . The refactoring may very will bring to light bugs that have a substantive effect on the outcome. It remains to be seen.

It will take time and further scrutiny of the code to answer your question. I am still working to understand the model and code before I dive in.

dlaydon commented 4 years ago

Hi @LetterRip. 

The non-determinism in the creation of the network wasn't really a bug. We knew it was there and as the resulting network of one run was statistically equivalent to that of another, this wasn't a priority to amend, especially given the intense demands of modelling the pandemic. Further, because network creation is performed once per geography, and all model runs we performed use this same network, it has no bearing on the rank-order of the most effective interventions that are predicted by the model.

To answer your question, we are not aware of any bugs that have had any effect on our conclusions. If you find any such issues, then please either raise them here (although please be specific if you can), or feel free to amend the code yourself and open a PR.

Hope that's helpful. 

zebmason commented 4 years ago

@dlaydon Is #268 a bug? If so it is so subtle all the regression tests pass.

dlaydon commented 4 years ago

Hi @zebmason, https://github.com/mrc-ide/covid-sim/pull/268/ looks fine. However this function is only called when DoAirports is set to 1, which it never has been in any of the Covid-19 runs.

robscovell-ts commented 4 years ago

@dlaydon what effect does excluding airports have? Is DoAirports set to 0 because the function is incomplete/failing, or because airports are thought to have no significant effect? Has work been done to test the statistical significance of including airports? Are they modelled as sites of social interaction, or potential sources/sinks for host numbers, or both? Has policy guidance on airports been issued based on your modelling?

weshinsley commented 4 years ago

The question is more, what data would we need in order to credibly switch that behaviour on. The Nature 2006 paper (Strategies for mitigating an influenza pandemic), investigated the effects of reducing air travel to contain a theoretical novel flu outbreak. From the abstract, such travel restrictions were found unlikely to delay spread by more than 2-3 weeks, unless they were more than 99% effective. Implementation details will be in the supplementary, so that's worth a read for the details.

For Covid, we haven't got the necessary data to model this right now; air travel data is quite hard to come by. In any case, I think it was pretty obvious from the early stages that covid is highly infectious. It didn't take a model to suggest that cases got from China to Italy and France by air travel and the need for flight restrictions was very clear. So reduction started firstly from the hot spot regions, and later more widely, since UK is itself a hot spot, with the 2nd highest number of deaths in the world at present. One can always argue with hindsight that these interventions could have happened earlier.

Lastly, I think it's good to point out that implying any guidance whatsoever was "based on our modelling" puts it rather strongly. Our model is one of over 40 scientific voices forming a body of advice in the UK, and decisions are made on the broad consensus of those voices, never on one model.

robscovell-ts commented 4 years ago

That last point is really important because the impression you get from news reporting is that the Imperial model was/is dominant. It confirms yet again my own experience that whenever absolutely anything I've known anything about, or been involved in, has been reported by media outlets, major misrepresentations of the truth (whether intentional or not) creep in.

zebmason commented 4 years ago

Don't know whether you saw The Daily Telegraph at the weekend. Apparently all the source code is translated Fortran...

weshinsley commented 4 years ago

I can't see it because it's behind the paywall (of course), but no, (sigh of despair), the source code is not translated Fortran. The Daily Telegraph should shut up.

Having screen-captured it out of a browser (press F5 and print screen quickly), it appears to be just another speculative rehash of the lockdown sceptics article really, with no extra facts, or extra anything. The fortan myth is respun. The determinism myth comes up at the end, and they talk about one big file being necessarily spaghetti, whereas they think proper engineering would give us some 500 files. Anyone fancy reviewing this code if it were split over 500 different files?

hughk commented 4 years ago

For Covid, we haven't got the necessary data to model this right now; air travel data is quite hard to come by. In any case, I think it was pretty obvious from the early stages that covid is highly infectious.

Air travel data is collected by the ONS in the UK sampling embarking and disembarking passengers, who came from where and the purpose of the visit, or whether they are transiting. As for raw air travel data, most flights carry ADS-B transponders which identify the origin and destination of the flight with other information such as plane type and operator. There is a database collecting this available at: https://www.adsbexchange.com/ free for non-commercial use.

weshinsley commented 4 years ago

ONS is really "travel trends" - they publish annual flows, so you'd have to do some work on parameterising how travel rates vary seasonally, and some sort of taking into account of internal flights. WTO (World Tourism Organisation) is similar (in fact probably uses the UK data from ONS), but global (not free, but inexpensive), and we've used that in other work, with an understanding that it is a crude measure, and you have to approximate something for seasonality.

From ADSB, you can't get the numbers of passengers which is the critical bit of information. You can work out capacity of the plane, and assume it runs at a certain (%) full, but not easy to guess that on a large number of flights, and I doubt it would be easy to defend at this time.

The ideal data you need to build the matrix is really the kind of data IATA keep, which is extremely commercially sensitive. (And hilariously costly).

zebmason commented 4 years ago

@weshinsley Must admit I was so confused by the article when compared to reality I cloned the 1st April repo and looked for translated Fortran which I could only find in the random number generator code. So I wrote a pointed rebuttal on LinkedIn on the basis that one of the authors is a level 2 connection.

To be honest I've seen much less maintainable code at blue chips.

robscovell-ts commented 4 years ago

It was the Telegraph article that brought me here in the first place. I was concerned about the non-determinism, but that is a non-issue. As a former Fortran programmer, I'm used to people thinking that if code is 'old' or in an 'obsolete language', it has somehow gone mouldy or rusty like an old factory. If it worked 20 or 30 years ago, it will still work now. I used the NAG libraries in my Fortran code at Pfizer in the late 90s. It was 'old' then but the maths it encoded was as valid then as it is now. I also support some legacy Delphi apps now, which are still fit for purpose because the business rules of the agricultural companies that use them haven't changed. I know a guy who still supports Acorn software and systems he build for light industry in Sussex in the late 1980s, early 90s. It's been ported to newer hardware but it still works.

ghost commented 4 years ago

So, if the code is air-tight as you say - let's see the original version and have a review by open-source community like it is done for JDK, Linux, 300+ ASF projects, and so on. Why are we having a pointless discussion of the merits of heavily massaged version of the code that hasn't been used for Report-9? This seems like a waste of time.

A couple of the comments on early points: @weshinsley

Anyone fancy reviewing this code if it were split over 500 different files?

I guess what you saying 'let's throw separate compilation, Dijkstra's work, decades of best practices as noone would review a project split in 500 files? I suggest you bring this novel idea to linux-kernel dev list - I am sure you will learn a lot ;). I dare you, actually!

"lockdown sceptics"

Here we go! What's next I wonder? Some vague Germany-related epithets, perhaps?

@zebmason

Apparently all the source code is translated Fortran

Apparently not what it says. The point being is that some of the code could've been written in Fortran which isn't a crime by itself. However, if certain compiler optimizations are used without much attention, that could easily lead to a certain degree of unexpected non-determinism. But I'd venture it's ok, as we are dealing with a stochastic problem here, right?

To sum it up: all these back-n-forth are really useless - no one but a selected group of people has seen the original code, build files (were there any, actually? Otherwise, you won't need 15KLOCs, I presume). Enjoy polishing it further!

insidedctm commented 4 years ago

Report 9 was 2 months ago, this codebase has moved on substantially since then. What is the obsession with Report 9?

doodlebro commented 4 years ago

What is the obsession with Report 9?

What is so hard about releasing the code that was used to generate it...?

You cannot solve a problem without showing the work you did to get there. It's quite ironic that academics aren't understanding this.

insidedctm commented 4 years ago

As has been stated in responses to other issues those results are reproducible from this code base.

bbolker commented 4 years ago

To second @insidedctm's comment: one of the developers has specifically addressed all of these questions here:

The code here is essentially the same functionally as that used for Report 9, and can be used to reproduce the results. The refactoring Microsoft and Github helped us with restructured and improved the layout of the code, with some documentation, to make it somewhat easier to scrutinise, but was written with regression tests against a reference result set to ensure changes in code structure did not change behaviour. We do not think it would be particularly helpful to release a second codebase which is functionally the same, but in one undocumented C file. Our refactor aims to make it easier (with some effort we acknowledge) for the open-source community to comment on the current live code, which is in use today. ... We do not think there is much benefit in trawling through our internal commit histories. Again, we would rather people focussed on the live code. If you wish to look through the undeleted branches in the repo, and use the method @alecmocatta points out (here), you are welcome to.

I'm wondering if this information should be pinned along with the "yes, the code base is (now) deterministic" thread (see also #179).

robscovell-ts commented 4 years ago

@bbolker I think that makes sense. The narrative out there about this being buggy is still strong.

On the topic of using Fortran code, which people make much fuss about, you don't complain about using 'out-dated' maths from 2500 years ago when calculating the area of a circle or the hypotenuse of a right-angled triangle!

davividal commented 4 years ago

As has been stated in responses to other issues those results are reproducible from this code base.

So we must trust because someone said that we must trust?

I saw that you made a PR: https://github.com/mrc-ide/covid-sim/pull/285

In case you didn't understand what happened: thanks to the repository history, you were able to track down WHEN a bug was introduced and fixed it. Not only that, but you were able to understand the CONTEXT behind the const int, which made it easier for you to change the type to double, instead of changing the number to 200000.

But... Is that number really correct? Or was it the result of some other incorrect refactoring?

NoFishLikeIan commented 4 years ago

So we must trust because someone said that we must trust?

With this code base, have you tried to reproduce the results of report 9? If so do you get (stochastically) equivalent results?

If so, I would trust that the reproducibility claim is correct.

zebmason commented 4 years ago

@weshinsley @dlaydon @doodlebro @robscovell-ts Nice supporting letter in the DT today by Emeritus Professor Les Hatton.

dlaydon commented 4 years ago

Thanks @zebmason, much appreciated.

sublimator commented 4 years ago

Bugs were in the assumptions

robscovell-ts commented 4 years ago

Thanks @zebmason

robscovell-ts commented 4 years ago

Professor Hatton's observations are spot on. Over my time in the industry I've worked with several systems that inexperienced or naive younger developers would call 'legacy' at best or 'spaghetti code' at worst. A mature software system evolves. Software is supposed to be changed -- it always goes beyond the original architectural decisions and structures. The skill of a senior developer responsible for these systems is akin to a game of reverse-jenga. You have to figure out how to fit the block in the right slot without bringing the tower down.

sublimator commented 4 years ago

@robscovell-ts

"You younguns! You just don't know what it's like to have deadlines and other priorities!"

I'm not sure I can agree with that.

Software engineering has developed languages and techniques to deal with abstractions and maintainability. A hallmark of good software is the durability of the abstractions in the face of changing requirements.

I'd of course marvel at some senior engineer being able to make sense of some giant mound of assembly that did something useful. But I'd be more impressed with skill to make test harnesses enabling radical changes to structures to keep things in a good order. i.e. maintenance of the code, rather than of an understanding in a single persons head.

Understanding how things can get to a certain place yes (we all been there), but shouldn't there at least be aspirations for something better?

robscovell-ts commented 4 years ago

@sublimator of course your comments are valid, and of course those abstractions do exist. Nobody sets out to create a chaotic mess! Unfortunately, we rarely get the luxury of working on something new, or new enough that its architecture is still tight, or new enough that up to date test harnesses exist, or previous developers have been test-driven developers. More often than not we inherit code-bases that have had several maintainers of different ability levels, and we have to make sense of it, and make the best of it. They may have been under resource constraints that were less than ideal. Dealing with an inherited code-base is definitely a learned skill.

Aspirations for something better? I've been in this game since 1988. Much of the web back end now is built on NodeJS using a bunch of npm modules of varying quality. That's if it's not using PHP! I'm not a progressive. Human nature is static. Human behaviour is cyclical. People just make the same mistakes in different languages. It is what it is and you deal with it!

OO was supposed to be 'something better'. The trend now is towards functional programming and away from OO because of concurrency issues and philosophical worries about the nature of state. (I recommend Rich Hickey on that topic BTW.)

sublimator commented 4 years ago

image

sublimator commented 4 years ago

@robscovell-ts Understood

weshinsley commented 4 years ago

So, I think this discussion has run its course now.

See the independent report here, in which the Codecheck team could identically repeat the Report-9 results published in this repo, and were happy to certify that the code reproduced Report-9 closely enough. Differences described in the article.