Closed bitcartel closed 4 years ago
@weshinsley Yes, hoping that you can ask your colleagues who have access to the original code to provide a hash, so that in the future the public can verify the integrity of any published code. Thanks.
@weshinsley Most of what is happening here...
@weshinsley I think you could make the repo private and ask people to submit applications to an email to work on it. You could filter the constructive people, even those who disagree with you and want the real code, but are not being dicks about it. Someone told me this idea so I'm not alone thinking it. They will have to commit their real names and email so that you know if they go to far they can be held accountable.
I'm using my real name.
Closing the repo? Why? Because some people in the interwebs asked for all the software to be released instead of half of it? Or maybe because there is some mean people posting... MEMES?! Oh, wait, that's you @Feynstein .
As I already wrote: with the complete repository there are higher chances for a quality work:
For argument's sake, let's suppose that one engineer made a mistake during the refactoring fase. Due to this whole pandemic, the accumulated stress kicked in and he made an understandable mistake. Now I will refactor this refactored code. And since I'm such a smart and cool guy, I'm going to add a lot of shiny unit tests to my refactored version. Also since I'm such a smart guy, all tests are passing.
Does that mean that the code is working? NO! Why? Because although I refactored the code, I didn't fix the error introduced in the first refactor.
@Feynstein (and any other who is against the full release of the repository) please explain to me how the scenario I described can be avoided WITHOUT full disclosure.
Please remember that I am not as smart as you guys, so I don't have the same Physics + Math + whatever background you have. On a good day I can solve a few Swokowski exercises using tables, but that's it.
@davividal I just sent you an email and also I can't find your fork of the project to see what you did... (edit) We talked in private about this, all is well
@davividal I just sent you an email and also I can't find your fork of the project to see what you did... Care to comment?
For the record: answered in private.
By the way you can drop the repo etiquette now, you guys are in for blood. You don't actually care about the research...
This is a model that has led to significant policy changes that have affected millions of people, not everyone will be interested in the research. But we are interested in accountability.
Similarly in security, there's the idea of not hiding your code or 'security through obscurity', when you release code even though you as researchers may not be as proficient in writing code it allows those from outside to vet it and find issues.
The idea that the original code probably doesn't exist anymore is ridiculous, the code was sent to Microsoft and it was version controlled. When this code was released, the commit history was squashed so the history clearly existed up to that point. When an organization like a university is publicly funded and is shaping policy I think that accountability should be pretty clear. We can't see the status of the code as far as I'm aware of the date it was run to make the 500k prediction.
The fact that changes were squashed undermines confidence in your work, I see you work in Physics, when I did my Physics degree lab books were to be numbered and no pages were ever to be taken out. You were also supposed to cross things out with one line so they were still legible. What this whole thing does is undermine confidence in the team, there is no reason to hide the commit history.
It's perfectly understandable that people specialize in different areas – epidemiologists specialize in their field and it's perfectly understandable that they don't have nearly the same amount of experience and proficiency in writing code when compared to computer scientists or developers. That's totally intended – the division of labour is how the society progresses comprehensively.
There is absolutely no need to assign blame or shame to the epidemiologists for open-sourcing their code – or any of such "academic code". They are not professional programmers and so the community should not expect their code to be perfectly flawless and readable. Even professional programmers make many mistakes!
The open-source community should instead actively encourage and welcome researchers in all kinds of fields to publish their code. A program isn't typically just written in one go, it's an iterative process. When researchers open source their "academic code", it allows the open source community to get involved and help to improve the "academic code" into production-quality code. In fact, if the open-source community wants researchers to write their code better, help the researchers to write better code by submitting issues, filing bugs, proposing pull requests and suggestions, and the likes. If researches feel welcomed, then more researchers and likely to open-source more of their work – and this is a positive feedback loop which benefits all stakeholders.
Work with the domain experts and not against the domain experts!
If the original C code wasn't under source-control, then so be it. It's perfectly understandable if the researchers were in a hurry to implement a working prototype ASAP.
Even IF the original code had no comments and wasn't very readable, it is still THE original working specification of the model itself. Comments are good and all, but they quickly become out of date. The only thing that does accurately specify a program's behavior is precisely its source code. The original C code could be very useful to see why the researchers wrote some statements in the current way, and how the implementation works.
Even IF the original code had no version history, it is still very valuable – it could be used as a reference to see how the current code differs from the original implementation. Refactoring is great at all, but even with a regression test suite, it is practically impossible to guarantee that behavior is preserved.
In fact, with the regression test suite driven refactoring approach, the regression test suite really views the original C code and this current code base as two abstract machines or black boxes. For the limited amount of input data provided by the regression test suite, output may be statistically equivalent. However, there are no guarantees that the same behaviors are preserved for inputs that are not accounted for by the regression test suite. This is also why the original C code is very valuable.
Be inclusive of Researchers and Academics who are not professional programmers
It's perfectly understandable that people specialize in different areas – epidemiologists specialize in their field and it's perfectly understandable that they don't have nearly the same amount of experience and proficiency in writing code when compared to computer scientists or developers. That's totally intended – the division of labour is how the society progresses comprehensively.
There is absolutely no need to assign blame or shame to the epidemiologists for open-sourcing their code – or any of such "academic code". They are not professional programmers and so the community should not expect their code to be perfectly flawless and readable. Even professional programmers make many mistakes!
The open-source community should instead actively encourage and welcome researchers in all kinds of fields to publish their code. A program isn't typically just written in one go, it's an iterative process. When researchers open source their "academic code", it allows the open source community to get involved and help to improve the "academic code" into production-quality code. In fact, if the open-source community wants researchers to write their code better, help the researchers to write better code by submitting issues, filing bugs, proposing pull requests and suggestions, and the likes. If researches feel welcomed, then more researchers and likely to open-source more of their work – and this is a positive feedback loop which benefits all stakeholders.
Work with the domain experts and not against the domain experts!
So, what about publishing the original source code?
If the original C code wasn't under source-control, then so be it. It's perfectly understandable if the researchers were in a hurry to implement a working prototype ASAP.
Even IF the original code had no comments and wasn't very readable, it is still THE original working specification of the model itself. Comments are good and all, but they quickly become out of date. The only thing that does accurately specify a program's behavior is precisely its source code. The original C code could be very useful to see why the researchers wrote some statements in the current way, and how the implementation works.
Even IF the original code had no version history, it is still very valuable – it could be used as a reference to see how the current code differs from the original implementation. Refactoring is great at all, but even with a regression test suite, it is practically impossible to guarantee that behavior is preserved.
In fact, with the regression test suite driven refactoring approach, the regression test suite really views the original C code and this current code base as two abstract machines or black boxes. For the limited amount of input data provided by the regression test suite, output may be statistically equivalent. However, there are no guarantees that the same behaviors are preserved for inputs that are not accounted for by the regression test suite. This is also why the original C code is very valuable.
The original C code can and should, understandably due to intense time pressure, be released "archived". It only serves as historical "living documentation" and "living specification" of what the original model's implementation was. It should not be supported which will distract the researchers. It needs no comments or commit history, if it is faithful to what the original implementation was.
Completely agree, at work we sometimes try implementing newer machine learning papers and notice that even in a very computer-centered field like ML code standard aren't always ideal. The issue is the idea of hiding this.
Also I'm pretty sure it was under version control, or at least some part of it judging by the squash.
Please read this other comment: https://github.com/mrc-ide/covid-sim/issues/274#issuecomment-632676779
That thing is still going? I though people understood this is not really the place to ask for that, and they better send an official FOI request. I'm not involved in this anymore, I'm sick I can't work on it. Please stop quoting me.
A FOIA request for the original code was filed and rejected by Imperial. https://www.whatdotheyknow.com/request/software_code_used_for_the_covid
@bitcartel Nice job! (if it was you that made it), this is very detailed. I'm sure people here will appreciate the commitment behind it. I personally really do. Keep it up!
Edit: I didn't see the end, it appeared when I reloaded the page... Can you make any further appeals? I can understand that they might not want to disclose publicly information that will be published later. There should be some kind of deal you could make with them? Keep it up, still!
@Feynstein Nope, nothing to do with me.
Looks like the person responsible has just posted an update to the FOIA request... notifying Imperial that they've filed a complaint with the Information Commissioner's Office. https://twitter.com/goatchurch/status/1264966832265531392
@bitcartel I think this issue can be closed. Imperial has provided the original code in response to a FOIA request (IMPFOI-20-293) which is not the one discussed above. I would recommend requesting the code directly from Imperial if it hasn't already been made widely available. It is best to get from the original source but if you have any problems I will upload the copy I have been provided with (114MB zip file exc. licensed Landscan file).
Update: By "original source" I mean Imperial as the source from which to obtain these files. As with anything else, it is best to get information from the original source to avoid doubt as to the authenticity.
This entire project started out with the original code from the Imperial College of London, and is now vastly improved compared to the imperial original code. Both the community and some of the developers from Imperial is members of this GIT-project. I see it as contra productive to go back to the old original code.
For anyone interested https://www.whatdotheyknow.com/request/full_version_controlled_source_c
I received the full code which can be found here.
https://drive.google.com/file/d/1cwTDgvUA-qs3ovOOz8JafQZEI8vZ0CiE/view?usp=sharing
Perhaps Imperial could make this publicly, and clearly, available if they have not done so already. Having it provided only after FOIA requests have gone to internal review seems time consuming for their staff and doesn't give a good impression of transparency for the organisation.
@DJ-19 the current code repository is sufficient to reproduce the published work.
@insidedctm some seem more interested in analysing the original code for themselves.
The original code are the same as the first posts to this projects Master branch. If anyone want to inspect the code.
@connywesth just to avoid confusion, is this what you would expect to see when opening the original code in Visual Studio?
The original code was not very well written from a developers point of viewe (the people that developed the system during the years in Imperial College, has mainly been pandemic researchers and scientists not developers).
This Open GitHub-project has been working for a couple of months to refactor the code substantially, now the code is much better compared to when the project started. The scientists where overloaded with their regular work, then they asked the internet community for some help.
This is also one of the important tasks the representatives for Imperial College in London asked for help with, in the first place. I'm actually a C# developer, I'm not an expert in C++, but it has been very insightful to be looking at the code as it has evolved during this "project".
Both status for bugs, structure and naming has been improved substantially in the latest version compared to the original code.
If you want to examine the code, I belive it's better to look at the latest version. Otherwise you will dig into an archeological excavation, of outdated code. It will probably be a total waste of your time.
But I dont know the purpose of your excavation.
I think the current version of the code is a better starting point for analysis and future development. If you really want to improve quality of the algorithms, functionality and codebase this is where to begin, I think.
The coders/contributors have made a big contribution to the research and science.
@connywesth unfortunately your previous statement that the first posts in the master branch are the original code is misleading for anyone who wishes to obtain the "original source code" as discussed in this issue. Your thoughts are appreciated however this issue relates to the predecessor of the codebase you are discussing.
OK, I'm sorry for that. I might have missed something in the discussion....
So what are the covid-sim truthers going to do with 18k lines of code that has no prior version history, which you can't run as at least one proprietary file can not be made public?
When will Imperial certify that the source code obtained by FOIA can reproduce report 9?
Here is the README from the FOIA code (thanks @sarjann) mentioning the copyrighted input file.
Original report 9 code
======================
This code will not work without an input population file which is commercially licensed.
We will release that underlying population file to individuals who provide written proof
that they have a license for Landscan 2018 - see https://landscan.ornl.gov/landscan-data-availability.
The code will only exactly reproduce report 9 results if run on a 32-core Windows machine
with the network file provided.
...
Then follow the instructions at https://github.com/mrc-ide/covid-sim/tree/master/report9
If you view the instructions on the linked report9 page, you will see the following:
Results generated as indicated below should exactly match those in the *T8_NR10* output
files in the GB_suppress and GB_mitigation folders
(see CODECHECK certificate 2020-010 https://doi.org/10.5281/zenodo.3865491).
...
Results will be close to but not identical to those in the original report since:
(a) the results produced here average over 10 stochastic realisations;
(b) the population dataset has changed to Worldpop (open-source) rather than Landscan (closed);
(c) the multi-threaded algorithm to create the simulation's household-to-place network has been
modified to be single threaded and thus deterministic. Covidsim is now deterministic across platforms
(linux, Mac and Windows) and across compilers (gcc, clang, intel and msvc) for a specified number of
threads and fixed random number seeds.
If you read the CODECHECK certificate, you will see the following:
I was able to reproduce the results (Tables 3, 4, 5 and A1) from Report 9...
Simulations were repeated using the public CovidSim implementation,
first released in April 2020 onto GitHub, rather than the private code
used to generate the findings in Report 9.
...
These variations between the original reported values and the reproductions are due to several factors:
1. The CovidSim code base is now deterministic.
2. Slightly different population input files have been used.
3. These results are the average of NR=10 runs, rather than just one simulation as used in Report 9
So Imperial has gone and certified the modified Github code, but not the original FOIA code. Since Imperial has the copyrighted input data file, it should be trivial for them to certify the FOIA code too.
Notice that the above README from the FOIA code says "The code will only exactly reproduce report 9 results if run on a 32-core Windows machine". So will the results change if run on a 16-core machine or 64-core machine? It shouldn't, if the code is thread-safe and deterministic in its execution.
However, from the text on the report9 page and CODECHECK certificate shown above, this is not the case -- the original code is not deterministic in execution. The implication is that there are bugs/problems with the original multi-threading code.
So what if Imperial had, as described above, run the simulation one time for Report 9 but on a different computer system? How different would the results have been? How might those different results have impacted the government's policy? The simplest way to remove any doubt is for Imperial to certify the FOIA code with the copyrighted data file.
[...]
Notice that the above README from the FOIA code says "The code will only exactly reproduce report 9 results if run on a 32-core Windows machine". So will the results change if run on a 16-core machine or 64-core machine? It shouldn't, if the code is thread-safe and deterministic in its execution.
The full history would be able to clarify this easily.
However, from the text on the report9 page and CODECHECK certificate shown above, this is not the case -- the original code is not deterministic in execution. The implication is that there are bugs/problems with the original multi-threading code.
NO! That can't be! They did not commit any bugs while writing this code!
So what if Imperial had, as described above, run the simulation one time for Report 9 but on a different computer system? How different would the results have been? How might those different results have impacted the government's policy? The simplest way to remove any doubt is for Imperial to certify the FOIA code with the copyrighted data file.
Stop that you non-believer! COVID-19 is there and it will kill 9326478623978456293867459423764579254 people by this weekend according to all simulations that were done but you can't verify! Trust on all simulation done until now. They are the revealed truth from the gods of universities to us, mere mortals!
I'm gonna report your comment as blasphemy.
@bitcartel
Why is it not sufficient that the code in this repository matches the result from Report 9 to dispel doubts about the results of the report arising due to a bug in the code?
From CODECHECK:
Note also that my independent run matches results by the Imperial team as of 2020-05-28.
@NoFishLikeIan
For the scientific method, to verify results of the original experiment.
Btw, how do we know if the FOIA code is the original code? The answer is we don't know, unless it can be demonstrated that the FOIA code can reproduce the results of Report 9. Since Imperial has the copyrighted Landscan population file, it should be easy for them to commission a CODECHECK for the original code.
@bitcartel
There is no experiment in need for reproduction, here we are talking about a computer simulation with stochastic modelling. There is an ample literature on the epistemology of computer simulation and none of the relevant authors thinks that one needs to be able to deterministically reproduce a simulation in order to validate it as scientific (e.g. Gilbert and Troitzsch, 2005).
The code given here is sufficient to reproduce the simulation that was used to inform report 9 and the paper supplies sufficient material to interpret and reconstruct the model. I understand why you might believe this is not enough but let's not claim that this does not hold to the standard of the "scientific method" (you don't need to go to Pisa and climb the tower to prove that G is constant).
@NoFishLikeIan
The "code given here" in this repository is not the original code, it is heavily modified code
Let's keep things simple.
Can the original code reproduce the results of Report 9 -- yes or no?
@bitcartel
If the objective is to uphold the scientific method, then the answer is simple:
Can the original code reproduce the results of Report 9 -- yes or no?
It doesn't matter, because Report 9 can be reproduced (up to "statistical equivalence", which is what matters in science) with the code in this repository, hence the result satisfies replicability.
The question stands: if you can reproduce Report 9 with, work on, and make statistical inference on this code (which again, is less important than the paper), why do you need the original code?
P.s. There is no need to discuss this further, if you are not satisfied you should keep asking for the original code. My point was that a huge amount of work has been done by the team to publish, clean, and make this code reproducible. And I believe that work is sufficient to understand (and, in case, criticise) the paper and the simulation. If you disagree, go for it and good luck!
Notice that the above README from the FOIA code says "The code will only exactly reproduce report 9 results if run on a 32-core Windows machine". So will the results change if run on a 16-core machine or 64-core machine? It shouldn't, if the code is thread-safe and deterministic in its execution.
However, from the text on the report9 page and CODECHECK certificate shown above, this is not the case -- the original code is not deterministic in execution. The implication is that there are bugs/problems with the original multi-threading code.
As we've said many times, we use a private random-number-generator per thread, which is the naive way of ensuring thread-safety when multiple threads draw random numbers at the same time. To identically reproduce any set of results we, or you, or anyone else has run with the model, you will need to use the same stream of random numbers as they did, otherwise, you will get a statistically equivalent result, but not an identical one. If you use the same number of RNGs, the same seeds for those RNGs, and the same other initialisation, you will get identical results - and if you repeat a specific run without changing any of those, you will see determinism. This is totally expected and standard behaviour in these sorts of models.
- The CovidSim code base is now deterministic.
I think Codecheck's comment above relates to first-time initialisation. Prior to Report 9, the code created a unique synthetic population each time you ran with /S on the command-line - which is relatively slow. The idea was that you'd do this once with /S, and re-use that population for the thousands of subsequent runs with /L (faster). This was simplified (one line was commented) so that /S was made deterministic (and still slower). It meant that Codecheck started with a different (statistically equivalent) intialisation for their runs, but could use /S every time if they wished, with deterministic results.
@weshinsley
Operating system threads are an abstraction over the underlying physical resources e.g. a 4-core Windows laptop can run thousands of threads.
So would appreciate if you could clarify the following:
"The code will only exactly reproduce report 9 results if run on a 32-core Windows machine"
Why does the FOIA code need to run on a processor with a specific core count? If run on a different processor, are the results the same? Thanks.
@bitcartel
I think that when I looked at the code this spring I noticed that each thread has a different random seed. I didn't see what were the rules for seed generation, but my guess is that it's OS dependent. Hence, if you try to reproduce results with different thread configurations, they are bound to be different. Unless you can somehow create a threadpool that uses the exact same rules for seed generation and thread assignment.
By the way, I think that the proper analysis would be to use multiple runs and statistical analysis to get a good guess at what are the most likely results. And see if they fit with the report. As to if they used multiple runs to generate their report, and if they should have, is another question.
The code by default runs using all the cores the operating system reports, with one thread per core. You could overide with the '/c' command-line if you had a 64-core machine and you wanted to force it to use 32-cores, in which case it would match. I don't think we allow the converse (doubling up and running with more threads than you have cores), because the performance will be very poor. (See EDIT below)
I suppose it would have been more accurate but verbose to say, "with a windows machine that reports 32 cores to the operating system, unless you have a larger machine and overide the number of cores with /c:32 as a command-line parameter". We did not imagine many would have a >32-core machine to hand.
EDIT: additionally you could set the environment variable OMP_NUM_THREADS to 32, and then run with /c:32, to circumvent the checks around CovidSim.cpp:245 and let OpenMP use that many threads, even on a single core machine. Eventually, and if you have enough memory on such a machine, you would produce identical results. So actually, there are some more creative ways of demonstrating the deterministic behaviour than we initially implied...
The user provides two random seeds, and then see Rand.cpp:setall for how the different seeds for different threads are calculated - not OS-dependent at all. That way, the user could, for example, get a batch of seeds from random.org, do ensemble noise or likelihood calculations over the results, while still having determinism on the individual runs.
To add to this. Yes, you could force a 32 thread job to run on a 4 core machine and get the same results. Slowly.
For throughput, running with 8 threads is optimal for UK scale runs.
Another academic group has independently exactly replicated the report 9 results using the original code and input files as part of the Royal Society RAMP initiative. They are preparing a paper on their analysis which should be out in the next month or two.
For those who believe that discovering a fatal flaw in this code might bring the the scientific support for lockdown tumbling down, I’m sorry break it to you to that other (notably LSHTM) academic groups informing SAGE in March used completely different models to reach nearly identical conclusions to our Report 9 in March. The relevant documents are online in the SAGE archive. The key conclusion that severe social distancing measures were required to prevent health systems being overwhelmed hinged only on estimates of R0/doubling time, hospitalisation rates and IFR (mortality risk). Given those estimates, any epidemic model would give basically the same conclusions we reached.
Best,
Neil
From: Wes Hinsley notifications@github.com Sent: Tuesday, September 8, 2020 6:20:48 PM To: mrc-ide/covid-sim covid-sim@noreply.github.com Cc: Ferguson, Neil M neil.ferguson@imperial.ac.uk; Mention mention@noreply.github.com Subject: Re: [mrc-ide/covid-sim] Re-open and unlock issue #144 (Publish original source code) (#179)
The code by default runs using all the cores the operating system reports, with one thread per core. You could overide with the '/c' command-line if you had a 64-core machine and you wanted to force it to use 32-cores, in which case it would match. I don't think we allow the converse (doubling up and running with more threads than you have cores), because the performance will be very poor.
I suppose it would have been more accurate but verbose to say, "with a windows machine that reports 32 cores to the operating system, unless you have a larger machine and overide the number of cores with the /c command-line parameter".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mrc-ide/covid-sim/issues/179#issuecomment-689022602, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRK7NYBRQ6J67463ECTCGDSEZRXBANCNFSM4M3RZUBQ.
"For those who believe that discovering a fatal flaw in this code might bring the the (sic) scientific support for lockdown tumbling down"
Attempts to repeat the original results should be encouraged as that's good science. What it can never do is provide any evidence that it was the best or only scientific support that could have been used.
@DJ-19 Good science would in fact involve reproduction, analysis and publication. A technical note should do in that case. If you only reproduce the results without any peer-reviewed methodology that's useless, bad science. I think any credible scientist would agree with me that peer-reviewed rebuttal is very desirable.
Of course, and that process takes quite some time.
@Feynstein thankyou for your thoughts. The pros and cons of the peer review system are well known however in the case of SAGE I was informed by my MP it was "...entirely appropriate that these experts are allowed to exercise their professional expertise without any undue influence from external forces". The alternate opinion or criticism of peers can be considered an "undue influence" from "external forces".
@weshinsley Thanks for describing the thread settings.
Btw, can you confirm the FOIA code was the actual code run for Report 9? As @davividal mentioned, why not publish the full history of the FOIA code? If an old version control system was used, such as RCS or SVN which Github doesn't support, just zip/tarball it up.
@DJ-19 If your MP thinks that the peer review process is an undue influence you should tell them that they are wrong. It's a way to remove (mostly personal) influence in research. My point was that any person that tries to reproduce the results seriously should publish it. There's no point in undermining research if you can't back it up. Especially if your methodology fails to meet field-specific criteria.
@Feynstein I don't think you understood what my MP was saying about how SAGE was operating but that's okay.
That's probably the case. I did not update my knowledge of the situation for a while. No need to explain, I just wanted to convey a global message I guess.
@bitcartel - I didn't work on that aspect of report 9, but as Neil and the FOI response indicated, that is the code used to produce the results for that report, and you can do it yourself if you have the right licensing. We'll see more on that when the RAMP analysis is published.
There is no prior history for that code; the bulk of it was written by a single author before github existed, and where version control was essentially backup. Formal testing and scientific rigour was every bit as important then as it is now of course, just today's tools make [some of] it more automatic.
I have both over the years and today worked with plenty of scientists who work with code and do not use source control. Universities, research councils, grants, journals, etc... do not mandate people's day-to-day working practice. It's getting less common for researchers not to use source control but I suspect it will always be a thing others will have to contend with in the future. Often times the versions of the code that accompany specific papers are as much version history as you get.
Issue https://github.com/mrc-ide/covid-sim/issues/144 has been prematurely closed as the original C source code has not been published.
Please re-open and unlock the issue so the community can provide feedback and discuss the comments made so far.
If a formal decision has been made to not release the original code, please confirm and document this in the comments of https://github.com/mrc-ide/covid-sim/issues/144, rather than abruptly closing the issue.