mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

Re-open and unlock issue #144 (Publish original source code) #179

Closed bitcartel closed 4 years ago

bitcartel commented 4 years ago

Issue https://github.com/mrc-ide/covid-sim/issues/144 has been prematurely closed as the original C source code has not been published.

Please re-open and unlock the issue so the community can provide feedback and discuss the comments made so far.

If a formal decision has been made to not release the original code, please confirm and document this in the comments of https://github.com/mrc-ide/covid-sim/issues/144, rather than abruptly closing the issue.

Feynstein commented 4 years ago

You guys keep giving s* to people that are probably low-wage grad students and post-doc. They decided to release the latest version they had on github because they knew this was getting important and I salute them for it. The original code probably doesn't exist anymore anyway. Under normal circumstances who needs an epidemiology algorithm that is production ready… No one in his right mind thought it would someday be urgently needed. No one of the computer engineering-y guys here would have ever wanted to do a cleanup of this mess anyway. And that’s the reality of things. People are focused on doing web-based server-client ruby fast delivery html5 software engineering bull**

It gets me really pissed to see all the comments that say use this or that production code technique. This kind of code never leaves the 8th basement of the university where its kept by some post-doc dude that has too much to do in the actual lab than take the time to make all those improvements. And then suddenly the world needs it. I think some people here need to take a good hard look at themselves and say why is this happening? Welp because theoretical fields like this are under-funded that’s why.

And by under-funded I mean 15-20k per year in research grants to grad students… those who get them. I’ve been there, I know how it is. I'm an actual scientific software developer that learned from scratch with a B.Sc. in physics and a M.Sc. in electrical engineering. Now, after a few years, I can do production-ready assisted defect recognition in x-ray inspection. When I first started I was working on code for radiation dose in radiation therapy without any real coding experience, this kind of code gets eventually cleaned up and reviewed before actually going into a clinical setting. And I made no real money while doing it even though it might be used someday to save one of your a*s. And while studying and having to do 40h a week of stupidly hard research work and still wondering if you’ll be able to pay rent the next month. People that make that kind of money don’t give two **s about the code. While all those undergrad computer science guys get full salary paid internships. And if they get their Ph.D. there’s no job for them in the market anyway… so they’ll end up being in the academia limbo forever.

I am personally used to using Geant4 for radiation physics and It’s almost the same thing! It has been going for yyeeeaaarrsss, and was eventually cleaned up because someday some company decided to use it. https://geant4.web.cern.ch/. This thing was being used for nuclear power plants before many of you could even walk by the way.

I'm sorry for this, but it had to be said sometime.

bitcartel commented 4 years ago

The original code probably doesn't exist anymore anyway.

The original code does exist. Multiple external developers including Microsoft and Github were recently granted access to it. See issue https://github.com/mrc-ide/covid-sim/issues/144 for more info.

Feynstein commented 4 years ago

Do you know what happened? I am used to this kind of thing. People at Microsoft and GitHub took the code and tried to do some refactoring then they sent it here so that the open source community can look at it. But what happened is that no one at both those companies is an epidemiologist... So no one really understood what was going on, making an object-oriented pass at the original C code nearly impossible because they don't want to get the liability of it. So yeah, better let the original lab post it on GitHub and get all the flak from the community, that's what happened. By the way, this is probably already documented in scientific literature.... I'll see if I can find the original paper... Omg you guys... If I have to make a pass in this in order for everyone to stop arguing I will.

No one with only a computer science background understands what's going on with the code I write. The reason is simple, software engineers simply don't have the mathematical and physics background to do it. I could write the most documented code in the world, if you can't understand what's the meaning of the Klein-Nishina differential cross section for photo-electric Compton effect, you won't have any clue, and I mean not one bit of understanding for my data containers and architecture choices... even senior architects and developers because they're not nuclear physicists, that's what happening here. And I already had, on multiple and multiple occasions, to explain to those computer science guys that integrate what I do why it might not always be 100% repeatable. Do you know anything about chaos theory or non-linear algebra? When you have so much variables in the code that even the slightest bit of rounding error in a double can drastically change the outcome of the simulation? I'll leave it here for you guys while I look in academic papers for the original article. Take a good look at it. It's why meteorological predictions are bad. https://en.wikipedia.org/wiki/Chaos_theory and especially this one: https://en.wikipedia.org/wiki/Butterfly_effect

It seems to me that 20GB of ram should probably lead to some kind of chaotic system.

By the way you can drop the repo etiquette now, you guys are in for blood. You don't actually care about the research...

Feynstein commented 4 years ago

Ok I got it... I suggest you guys look at the methods on page 213, that explains it all, you can basically reconstruct the original code from it. ferguson2005.pdf I got the paper from my University's library connection to nature... so as soon as this issue is closed I will delete this comment.

I'm pretty sure the original code to this looks very similar than the one in the repo. I looked at it and it is pretty much a single file. This is clear to me that both people from microsoft and github didnt want to be involved in this, the liability because of all that happened is too much for them. They saw it and they had the exact same reaction as you. "I won't touch this even with a hockey stick".

Feynstein commented 4 years ago

So yeah, conclusion: This is legit... this is the version that microsoft and github didn't want to be involved with... why? :

  1. If you look at CovidSim.cpp it's formatted like a one pager C main program. No comments whatsoever... this is clearly the work from someone translating from Fortran to C. I know many other physicists whose summer internships where to translate such codes for use in fluid dynamics labs, by example.
  2. If you look at Kernels.cpp this is clearly the work of someone who knows what they're doing, good comments and the intricate use of openmp is the tip off.
  3. You see why the guys at github and ms would have focused on performance issues in order to get results quicker considering the ongoing pandemic. So that it seems that only "critical" parts of the code (like kernels) are modified by people with visible experience.

These conclusions are based on my experience in both the academic programming setup and the continuous delivery - agile style - scientific software production setup. I won't dox myself, but if anyone wants to get credentials just private message me.

What I would suggest though and I might file an issue on this is the self written (hardcoded) random seeding... this is baaaaaaddddddd, like really bad. You better quickly switch to an experienced and proven random number generator. Like CLHEP: https://gitlab.cern.ch/CLHEP/CLHEP

BenLubar commented 4 years ago

You guys keep giving s*** to people that are probably low-wage grad students and post-doc.

Why are we trusting the lives of billions of people to a group of people you don't seem to think we can trust to publish some already-existing source code on GitHub?

Feynstein commented 4 years ago

That's a question that's out of this scope, you have to ask your own government for that answer. It's not their fault if this rusty and ugly piece of code was suddenly placed into the spotlight. It's not that it's a bad scientific algorithm, it's that it's not up to standard used in the industry. The decision of using it is not related in any way to the people that did it. And by the way, they're the only ones that have such an impressive modeling and it really stands out from others that use SEIR-type models. It's pretty much the best thing that's made in this field... I read the paper carefully and there is a lot more attention posed on details than I expected, really.

I do not try to subtract the group from constructive criticism, far from it in fact. I want to give context as to why they're not releasing the original code. It's the exact same code but with the ameliorations coming from GitHub and ms. If you look carefully at #144, their last answer especially, you will see that it is pointless to release the original code since there's probably not much relevant difference and they want for people to know what was scientifically used to take to decisions. They also want to make it easier for ordinary folk to understand. The original code was made specifically for an Asian flu epidemic, making it unusable for this current pandemic.

And finally... Look at the damn paper I uploaded... If you are a scientific software developer you can use it in order to generate the original code. It's called a scientific paper and it's the academic equivalent to the C original. If you can't come up with your own version using the methodology displayed in the article your not in a position to ask for original code. It would be completely pointless for you to have it because it's an unoptimized probably very intricated piece of software that you could not understand l. And I do not mean it in a bad way, I know you're not stupid, you just don't have the required background for it to make sense.

ianna commented 4 years ago

What I would suggest though and I might file an issue on this is the self written (hardcoded) random seeding... this is baaaaaaddddddd, like really bad. You better quickly switch to an experienced and proven random number generator. Like CLHEP: http://cmd.inp.nsk.su/old/cmd2/manuals/cernlib/CLHEP/RefGuide/random.html

@Feynstein - more up to date link: https://gitlab.cern.ch/CLHEP/CLHEP

Feynstein commented 4 years ago

There's no conspiracy behind it, it's just ordinary scientific folks that want to be as rigorous as they can so that their work is not misquoted because the original code is clearly not made for the actual pandemic. And the fact that they involved GitHub and ms to work on it is the best example of the scientific rigorous work you expect from them, because they knew it was not up to standard. It all makes sense in the end.

BenLubar commented 4 years ago

@Feynstein I'm finding it very hard to take you seriously given that you appear to simultaneously be arguing that we shouldn't be allowed to see the original source code because we "wouldn't understand it" and also that the people maintaining the project aren't smart enough to understand it either.

Do you actually understand anything about software development, or are you a troll trying to spread confusion?

Your walls of text with no substance aren't helping your case.

Feynstein commented 4 years ago

Ah come on work with me in this one. I'm sorry I might be confusing, French is my first language. What I'm trying to do, as I said, is give a bit of context about this. I don't say that they should release or not release. I'm trying to do the Devils advocate in order to try and understand their decisions. I'm very sorry for the confusion. In my opinion, for real, it would be easier for them to release the code in order to stop this conspiracy nonsense. But on the other hand, I understand the scientific rigor behind the decision... Can I have a way to communicate with you out of GitHub without risking to be doxxed? I really am not wanting to troll. I'm sorry.

Feynstein commented 4 years ago

I do work in a company that "tries" to use continuous delivery (you know what I'm talking about) and I know I know less about the best development practices than the other computer science guys in the company I work for. That's why I have an integrator/architect attached to me to help me integrate my work in the bigger architecture and it can be used in production.

BenLubar commented 4 years ago

Among other things, I'm very surprised that you'd spell your own name wrong, Mister Bélanger.

If you want to prove you really are who you say you are, simply post a link to your GitHub profile from any of the other places you frequently post on the internet.

You appear to have much better English on other websites as well.

Feynstein commented 4 years ago

I just sent you an email... are you happy now that you revealed my name? That's perfect... it's always like that with your kind of people... You try to really express what you think in order to bring a better view of the problem and you get laughed at. You get doxxed/hacked and your personnal info ends up on some obscure 4chan thread... You know what? I don't care anymore... do what you want with that... I tried as much as I can to express a scientist's view of the situation. Just try to leave enough money in my bank account so that I can pay my rent this month.

Feynstein commented 4 years ago

And i'll finish with a quote from another physicist in the comment section of this page: https://lockdownsceptics.org/code-review-of-fergusons-model/

"Time and money. The author themselves stated that insurers have managers and professional software engineers to ensure model software is properly tested and understandable, which academic efforts don’t. Academics would love to be able to employ a professional software engineer to work with them and make sure their code is up to scratch. Occasionally someone does manage to scrounge together the funds to do so, but most groups simply do not have the money to hire a professional software engineer. Academia is a constant game of trying to spread the resources you have as far as they will go.

By all means encourage government to increase science funding and require that a large coding project employs a professional software developer, but if you just gave the money that academic epidemiologists have used to do their work to the insurance industry and asked them to do the same job, but producing better code, they would laugh at you."

Feynstein commented 4 years ago

@BenLubar That's what I thought... When you find out who I really am you don't have anything else to say. I'm 30 years old and I'm only finishing a master's degree because of all the hard stuff I lived. Even though I'm lead scientist in my company I still live in a 1 bedroom apartment and probably make half as much as you. People like me and them are pushing the boundaries of science daily and hardly get any recognition from anyone. In fact right now these scientists, even though they created a very good model, get **** from this community for trying to do the right thing.

ghost commented 4 years ago

Love these personal dramas. But back to ticket, original source, allows us software engineers to create models (code evolution over time, very important for validity), and double plus good - original data and inputs so we might be able to help creating unit and property tests and, heaven help us; even though we aren't scientists we might be able to do maths and stuff (like understanding floating point, and variability in inputs and data) .

davividal commented 4 years ago

@Feynstein

In fact right now these scientists, even though they created a very good model, get **** from this community for trying to do the right thing.

"Talk is shit, show me the code".

The "right thing" to do is to publish the entire repository history, not the squashed version. Anything short than that is BS.

davividal commented 4 years ago

Love these personal dramas. But back to ticket, original source, allows us software engineers to create models (code evolution over time, very important for validity), and double plus good - original data and inputs so we might be able to help creating unit and property tests and, heaven help us; even though we aren't scientists we might be able to do maths and stuff (like understanding floating point, and variability in inputs and data) .

I would also add that this would allow the community to validate all the refactoring that was done. No one is imune to errors, so was the original code functionality preserved across all the refactory? It is really hard to state that without unit tests.

Feynstein commented 4 years ago

@tau-tao I've said everything that had to be said. If I were you I would start working on this code right now to figure out why there's no repeatability and stop waiting for the first C version, which might never come... You know you don't need the original source in order to trace it properly and figure it out. What you said is BS. I wonder if you really understand what's going on in this code... What you can do to check validity is look at the papers and re-run it in order to replicate the data they got. That's a good idea. I suggest you look at the first one in 2005 that I uploaded in my comment earlier, it seems to be the earliest paper from the professor. That's what I would do, after doing an object oriented pass in CovidSim.cpp

@davividal You also should start working on this version right now while you wait for the original. Don't act like you're better than all of these folks doing their best. Why don't you start writing the damn unit tests yourself?!? ... Or maybe even functional tests? But you need to understand what it does eh? Hum too bad...

Ah man, you guys are cancer... This community can be so toxic at times... Why don't you start implementing the random number generator switch to a more recognized one, like I suggested earlier? Can you actually do that? Can you really find where and why the random numbers are used so that you don't break stuff? I'm done... really, i'm f* done with you.

On another note I'll try to work on it this weekend, in order to show that I really want to help them.

ghost commented 4 years ago

OK, looked into it a bit, pretty sure I am not cancer (given that is generally a genetic dysfunction, very close family have had it :-(, not sure how I would cause that) - not sure what that means really. Just think that original source,input, data might help. smiley face :-).

davividal commented 4 years ago

@Feynstein: about software testing: it is not about understanding the actual code, but knowing what the code is supposed to do. Then writing a test taking that in consideration, running the test against the current code and acting upon the test result.

Feynstein commented 4 years ago

I know that, but didnt they provided outputs? If you consider that first output as your regression line, you can work from there easily. The non-reproductibility issue is probably due to their multi-threading seed generation that means its not the same seed on every thread. I think it would be easy to fix that by using a mutex for their seed. Im sorry about the cancer stuff people get me started up sometimes with non-constructive stuff.

beewenib commented 4 years ago

Interesting debate here. I think what needs to be answered is "why is this code here"? Like you say, unless you're an epidemiologist, there's no use for it. So, why is it here?

So, why is it here? I get the feeling this isn't a software issue. Even if it was, there's no budget to do anything.

insidedctm commented 4 years ago
  • Is it to for others to copy and immediately use? Doubtful. The worlds top epidemiologists (Johan Giesecke, Knut Wittkowski, Dr John Ioannidis) all point out that the assumptions (and therefore inputs) into this model were completed under extreme pressure and not peer reviewed. This has nothing to do with the model.

So it's entirely possible to examine what the model would output with different assumptions. That seems entirely a useful thing to do.

  • Is the code here so that they can get feedback and improve the algorithm? Doubtful. Epidemiologists already have good models. Although there are a lot of C++ experts here, how many are going to give feedback on a good architecture that's needed for performing epidemiological computations? This model is a tool, nothing more, nothing less. This tool will not predict the future. This tool does not replace the experience and knowledge of a seasoned scientist. Seasoned scientists already have models.

I think you completely under-estimate how useful it is to have baseline code to work from. The last thing you want to do as a researcher is to have an interesting new idea and then have to build everything from scratch.

  • Is it a PR move? Likely. If so, it's not a very good one. The best PR move would be for Imperial College to admit some mistakes (like all scientists do) and have an open debate with other scientists that disagree with their methodologies so that the scientific community can go back to being open to discussions instead of arguing. This is how we improve and learn from each other. This is the scientific method.

Seems unnecessarily argumentative.

beewenib commented 4 years ago

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

insidedctm commented 4 years ago

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

If you want to debate the reasons why the code was released this isn't the appropriate place

beewenib commented 4 years ago

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

If you want to debate the reasons why the code was released this isn't the appropriate place

I agree, that makes sense, and I'll stop. Being a scientist, you need to reflect on your initial reaction to an assumption that someone is arguing VERY seriously. You just broke the scientific method. I do acknowledge that you're in the spotlight right now, but that's a part of the profession. If the mere suggestion of this is disturbing, you need to fix this yourself.

bitcartel commented 4 years ago

I believe it is important for public trust in the scientific method that the original code be made available.

British scientists, researchers and engineers should have the same level of access to the original code as granted to American corporations (Microsoft, Github) and independent developers (John Carmack).

@weshinsley Please consider re-opening https://github.com/mrc-ide/covid-sim/issues/144 so that this ticket can be closed. Thanks.

insidedctm commented 4 years ago

I agree, that makes sense, and I'll stop. Being a scientist, you need to reflect on your initial reaction to an assumption that someone is arguing VERY seriously. You just broke the scientific method. I do acknowledge that you're in the spotlight right now, but that's a part of the profession. If the mere suggestion of this is disturbing, you need to fix this yourself.

There's a difference between debating an issue and being argumentative. I'm not sure where you got your list of "The worlds top epidemiologists" but they sound suspiciously like the subset of epidemiologists who agree with your predecided position. I'm not taking lectures from you on the scientific method. Now that is argumentative, so I'll stop.

insidedctm commented 4 years ago

I do acknowledge that you're in the spotlight right now, but that's a part of the profession. If the mere suggestion of this is disturbing, you need to fix this yourself.

I'm nothing to do with Imperial and not in anyone's spotlight.

weshinsley commented 4 years ago

I have explained amply why this would be so unhelpful at this time, and our reservations have been confirmed vividly by the behaviour of various commenters, their misuses of the current repo, digging up our pre-release fixed issues and publishing them as current alarming concerns, etc, etc, and their refusal to engage in anything actually relevant to our science.

Clone or fork the live repo for yourself. Build the code. We provided you all the project files to make that really easy. You will be able to understand the code if you read it, read the papers, and apply some effort. Plug in some parameters from report 9 and see what results you get. Actually try it. We do not have any more time for these debates, or to teach basic epidemiology here. If you come across a genuine code issue, rather than demand or abuse, file an issue.

This is not a formal or final decision, but please understand the toxic environment currently being created will prevent publishing anything further at this time, beyond the ongoing work on the live code.

Feynstein commented 4 years ago

@bitcartel It really seems like this particular contributor to the repo cannot give you the answers you want. I have an idea... Here what we do when we're angry at some public figure is we write to them. May it be our deputies or whatever. Try to work with me on this. You probably won't get your answers from those people that are looking at the code right now. Except from one. And you know who it is. Why don't you try to use you perfectly sharpened internet skills to go after their boss and ask him what you want, and why you want it? Because it seems clear to me that the decision won't come from these guys. I suggest even creating a formal petition online... Here in Canada we can set up a petition at the parliament or the national assembly. That's how you do it irl. You can probably even call your deputy and ask... if it works like here. And it's because of your insistance and because of their lack of power in all this that the best they could do was close their discussions. What do you think about that?

Feynstein commented 4 years ago

So yeah, if you agree, we can close this issue and see what else we can do in order to get you what you want. What could be the best way in order to reset the toxic environment that has been spreading because of all that misunderstanding?

beewenib commented 4 years ago

Since you asked, here is a (small - there are many) subset of the epidemiologists that vehemently disagree Neil Ferguson's non-peer-reviewed input into your model. Following the scientific method myself, I've listened to both sides, as well as debates between them. I suggest you have an open mind and do the same so that you can learn something. Their predictions of actual R0 and IFR months ago are becoming more and more factual every day.

doodlebro commented 4 years ago

but please understand the toxic environment currently being created

Do you realize that the toxic environment comes from a complete lack of upfront transparency by the project runners...?

https://github.com/mrc-ide/covid-sim/commit/bd87d475563cd54978325bf73ce45e80a7c8de65

Fix the issue and that "environment" goes away.

weshinsley commented 4 years ago

No. the toxic environment was there long before, and would have been similar whichever version we released, and no, of course it won't go away if we publish the legacy code. Just more confusion would likely result from people with whatever motives, who don't understand epidemiology, making their judgments on what the old code implies, and finding it much harder to run for themselves.

Given our very limited (by which I mean zero) capacity for dealing with all of this, as we are researchers busy full-time with a pandemic, we went for one codebase that was easier to review, that you could use to produce the same results, and that is in use today, providing ongoing insights.

doodlebro commented 4 years ago

we went for one codebase that was easier to review, that you could use to produce the same results, and that is in use today, providing ongoing insights.

And that is still entirely possible without squashing the commit history at the start.

People are not asking for multiple codebases, they are asking for you to stop being opaque about what existed prior to this lovely commit: https://github.com/mrc-ide/covid-sim/commit/bd87d475563cd54978325bf73ce45e80a7c8de65

bjcband commented 4 years ago

@weshinsley you are making fucking excuses. You could easily publish the legacy code, and the amount of people wanting it outweighs the "confusion" you say it will bring. This makes the situation more suspicious. Make another exuse, for sure, but it will not be forgotten.

weshinsley commented 4 years ago

Technically I do have merge rights, but only use them exceptionally when requested. I am not a code owner, nor do I carry authority for the code, nor do I have access to code outside of this repo, or any of the things you are asking for. I voluntarily help with issues and where I can help those with genuine interest to use the code in this repo.

Please abide by the code of conduct.

Feynstein commented 4 years ago

@kkisama I'm talking to you as a "Friend" you know that by doing this you exposed yourself to abuse reporting, right? I don't know if you know the implications of this, but unless you use a VPN, you might get a little hello from the github people for creating such an account. And your little minecraft js code endeavour might be short lived.

Feynstein commented 4 years ago

@weshinsley I suggest you close #199 as it is very innapropriate and we move all these "original" code issues to this one. And keep it open long enough so that when a decision will be made it will be adressed here. That way, if everyone respects the rules... we might be able to keep issues under control.

davividal commented 4 years ago

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

If you want to debate the reasons why the code was released this isn't the appropriate place

Since the code wasn't fully published, I think it is fairly reasonable to wonder about the reasons as to why it was released.

Feynstein commented 4 years ago

@davividal so that people can work on it to help? Like most of the open source stuff that gets dumped here?

insidedctm commented 4 years ago

Since the code wasn't fully published, I think it is fairly reasonable to wonder about the reasons as to why it was released.

Sure it’s entirely reasonable to wonder about the reasons but don’t do it in the code issues it just pollutes it for those that are interested in understanding the code, helping to improve it and also branching it to create new models. Such political musings and ratings can take place on platforms like Twitter, Reddit or Quora.

bitcartel commented 4 years ago

To ensure that when the original code is published, it matches what was actually provided to external developers (e.g. Microsoft, Github, John Carmack)...

@weshinsley could you please help provide us with a SHA256 message digest of the original code (the single file containing 15k lines of C) and any other input files? Thanks.

omalaspinas commented 4 years ago

Maybe it's not the place or time for this debate but I read the thread and I have major issues with how this particular transparency thing is discussed here.

  1. This code is not the original model, so it is "useless" to reproduce the results presented two months ago, which is problematic in itself (trust in the academia, government is key for such things). How can you ask people to blindly believe you for such important matters? And when they finally discover that the results were maybe wrong what happens?
  2. The original paper of 2005 does not in any case allow for a reproduction of any results. It is so vague that there are about as many stars in the galaxy that possibilities to make an implementation of the model. If you think you can reproduce any of the result in this paper @Feynstein just be my guest. Patronizing people on such important matters is not the way to go.
  3. This code published here contained and still contains apparently a certain amount of errors which certainly will be corrected. I have no doubt about it and we see here the importance of open sourcing academic code.

I'm not here to debate about the poor grad students that got involved into this because they evidently do not deserve the criticism. Not even the professors who were grad students at some point and may still be using some of their code that was written at that time. But dismissing the larger problems (or denying they exist) here because "hey that's how academia works" is not a valid point.

This is a LOT larger than the PhD/Post-docs that contributed here. This is about how academia has become and the pressure people are under to publish quickly, possibly poorly validated results in order to get grants and permanent positions. And on the other hand how reviewers that are getting tens of papers per year to review cannot do their work properly, because of deadlines, because they are not paid in any way for peer reviewing, because no grant money is coming from it, etc.

It is important that the entire academic institutions (and the states that fund them) are held accountable for how things developed (we did not get here by accident). It is not acceptable at all to see complete works to be so far from being reproducible and this is especially true when they are used for policy making. Releasing the code and making research completely reproducible when possible should have been mandatory for some time (actually it should have been like this forever but... some may argue that it was technically not possible, was it though?). We are moving towards this kind of behavior but only for the last 5 years or so. This case here will certainly not help in a lot of other fields (unfortunately climate change deniers will have a strong case with this one....) and denying the basic fact that there is a problem with how academia works and is in need to restore its image will certainly not help.

weshinsley commented 4 years ago

@bitcartel You already know I do not have access to that.

@omalaspinas Indeed, our issue tracker is not the best place for this sort of discussion. If, as you say, you have no doubt there are certainly errors in the code, then why not join the open source effort and submit bugs about the issues you have observed, including any reproducibility problems you come across, before you vaguely state all of this.

omalaspinas commented 4 years ago

@weshinsley Still 90% of this issue is about exactly what I wrote. And having a different opinion that "well that's academia what were you thinking" is important IMO.

I also would like to mention I offered my help to the main author by e-mail two months ago. I'm still waiting for an answer and therefore started my own project on the topic (with my limited abilities and those of other open source people). Although two months ago I would have contributed with great pleasure and motivation.

pdehaye commented 4 years ago

I also would like to mention I offered my help to the main author by e-mail two months ago. I'm still waiting for an answer and therefore started my own project on the topic (with my limited abilities and those of other open source people). Although two months ago I would have contributed with great pleasure and motivation.

Without having read the whole thread, I want to echo what @omalaspinas just wrote.

Long background: I am a professional mathematician who worked on the OpenDreamKit project, a large open source project meant to build a Digital Research Environment for Advanced Mathematics (open D.R.E.A.M. toolkit). This project was the first consortium in Europe to consider in its Data Management Plan code as data to be preserved under the DMP. See our reports here. We hired and co-trained with a lot of Research Software Engineers. See #209 for why this was a good idea, and here for a longer discussion from the group leader, and a post on how OpenDREAMkit came to address the systemic problem underlying this whole GitHub repo. In the final project review, with the quote

with a special thought to all those that entrusted years of their career to temporary positions in this project

The collaboration built engagement material for the scientific community, such as here. See also the wrap up by the group leader, with a pleading call

Public bodies ought to fund basic software development

and great foresight in "Securing the future of our people/Future of our temporary [software engineering] personnel", with a list of highly qualified software engineers who have taken the professional risk of helping academics for a few years for a temporary EU project, hoping that their expertise would be valued by academia and leveraged in securing further grants (i.e. going beyond just lamenting that there is a systemic problem).

With this background laid out, I want to compound on @omalaspinas' criticism. I know him and have personally sought his and his students' expertise in the past, since we both live in Geneva. He was one of the first people I reached out to when COVID happened, and we have been trying for weeks to get to research software actually used by digital epidemiologists, and reproduce their results.

My personal motivation is in assessing deep mathematical flaws in the modeling of the epidemic, and particularly in the mindless repurposing of digital epidemiology models to assess the usefulness of digital contact tracing. Are digital contact tracing apps "just" the online version of manual contact tracing apps? Well, this "just" carries a huge amount of heuristics, and mathematical theorems (not models) such as here show that this sort of intervention is likely to amplify bias in complex ways, which might in turn affect the legality of the deployment of such apps. It would be good to be provided with the material needed from epidemiologists in order to enable independent assessments whether such deployments are justified or not.

(Note: this comment would not be complete without mentioning William Stein. William identified early not only that this issue was systemic in mathematics, not only formulated a plan to fix it, but was willing to gamble everything he had built professionally twice in seeing is vision realized: once in launching the Sage mathematical software [link needed to a long thread explaining the origin story of Sage], and once in switching his attention to CoCalc, which grew out of essentially a hosted version of Sage)