mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

Abstract fread/write_big into f_big #398

Closed zlatanvasovic closed 4 years ago

weshinsley commented 4 years ago

Well... I see the point - reducing line numbers and sharing the code...

But conceptually, I'm not quite sure whether we want to merge reading and writing like this. Taking a step back, I can imagine there being ways of making some file reading faster that might not apply to file writing. (See Issue #223 which didn't quite make it into workable code, but shows the kind of read-specific sort of work I'm thinking of).

If we go with it, then I'd prefer the parameter to f_big to be a const containing READ | WRITE (rather than an arbitrary integer 0 or 1). But I'd like more thoughts from others about whether this is the right sort of refactor on this code, or whether a different sort of rewrite is really needed, which would be (slightly) harder to unpick if we merged them. I suspect there's a better more modern answer for both the reads and writes currently being done.

zlatanvasovic commented 4 years ago

@weshinsley Even if some change is to be implemented specifically in read or write, most of the code still can be easily abstracted. It can be done much more idiomatically, but C++ lacks many metaprogramming features, such as code evaluation from a string. About READ | WRITE, any form of parameter you prefer will work. I always allow edits by maintainers, so you can edit it before merging.

DavidVernest commented 4 years ago

We are merging code into a live system - having the maintainers modify the logic is not good practice. It is the developer's responsibility to deliver a cohesive solution before asking for the merge.

zlatanvasovic commented 4 years ago

@mstevens-uk I've already implemented the cohesive solution which has all the required logic, as you could have noticed. It's also far more cohesive than the previous solution.

However, what I cannot do is try to change the preference of maintainers for specific variable names or forms. For me having a variable with possible values of 0 and 1 or "READ" and "WRITE" is essentially the same. And as @weshinsley has said, it's just their own preference, not a must.

matt-gretton-dann commented 4 years ago

The code as it stands needs reworking to cope with errors from fwrite() & fread() - including those that may not be fatal and just indicate a retry. As it is I think if the disk gets full this code will go into an infinite loop when writing to disk.

My question is why can't we just use plain fread/fwrite what do these functions gain us? (Is it simply some Operating Systems perform better if you make the elements as large as possible compared to the number you are outputting?).

So I don't like this PR. But the functions do need reworking - if only to have better error handling.

(And on the 0 or 1 or enum question: I'd prefer a lambda which did the read or write).

weshinsley commented 4 years ago

Yep, so our original code looks like it is forcing a chunk size of 2Gb for reads/writes, so I too wonder if this overides some defaults (either OS/compiler/some combination) for the chunk size that fread/fwrite use. I can imagine Neil testing this over infiniband with some much larger files than we are used to - full US, or perhaps all of Africa, which we have run at times.

So this PR is just factoring out the code that's common to both the fread_big and fwrite_big functions, since they only differ in two places - f[read] or f[write]. But, if/when error handling is added, will that motivate us back to wanting separate read/write functions again? Presumably there are different potential errors for fread vs fwrite...?

matt-gretton-dann commented 4 years ago

My ideal would be to replace the f*_big functions with xf* functions which wrap the f* functions handling errors and smoothing over platform issues (similar to the x*alloc I've suggested elsewhere: #406).

On UNIX platforms the errors in particular we should be handling are EAGAIN & EINTR which are indicators that the call got interrupted - not that things went wrong. On Windows whilst the above are valid codes I don't think they are used - instead we should consider an invalid parameter handler.