Closed nsheff closed 3 months ago
A counter argument: if it defaults to True, there is the potential for accidental data loss due to overwriting. I had assumed False was the default as a safety precaution.
the same argument goes in the other direction. without overwriting, you're losing the new result, and retaining the old one. So your safety precaution doesn't make sense, unless you prioritize old results over new ones.
In the past, I recorded them all.
This is the core issue raised here: https://github.com/databio/pypiper/issues/209
I flipped force_overwrite to default to True. I will do the same in PyPiper. However, we will still need to allow pipestat to offer the ability for history of results:
In the longer term, pipestat should offer the option to include a history of results, and these should be stored somehow in the file (and database). This may not actually be too hard to implement; just add a 'history' function, and when something is overwritten, just move the old values into the history in a way that is an array, rather than a single value. Then, pipestat could offer a clear history function to remove old stuff, if desired, but otherwise, repeated reports of the same result will simply add to the history.
Right now,
report
has aforce_override
parameter, which defaults toFalse
.https://github.com/pepkit/pipestat/blob/3d150f7615c927b809a2f745be5aa8f3876203f9/pipestat/backends/abstract.py#L150-L157
For me, it seems more natural that if I
report()
something, I would want it to replace whatever I had there most of the time. So, I think I would prefer if it defaults toTrue
. (and maybe is renamed).Open for discussion...