pepkit / pipestat

Pipeline results reporting package
https://pep.databio.org/pipestat/
BSD 2-Clause "Simplified" License
4 stars 2 forks source link

Should force_overwrite default to true? #161

Closed nsheff closed 3 months ago

nsheff commented 5 months ago

Right now, report has aforce_override parameter, which defaults to False.

https://github.com/pepkit/pipestat/blob/3d150f7615c927b809a2f745be5aa8f3876203f9/pipestat/backends/abstract.py#L150-L157

For me, it seems more natural that if I report() something, I would want it to replace whatever I had there most of the time. So, I think I would prefer if it defaults to True. (and maybe is renamed).

Open for discussion...

donaldcampbelljr commented 5 months ago

A counter argument: if it defaults to True, there is the potential for accidental data loss due to overwriting. I had assumed False was the default as a safety precaution.

nsheff commented 5 months ago

the same argument goes in the other direction. without overwriting, you're losing the new result, and retaining the old one. So your safety precaution doesn't make sense, unless you prioritize old results over new ones.

In the past, I recorded them all.

This is the core issue raised here: https://github.com/databio/pypiper/issues/209

donaldcampbelljr commented 3 months ago

I flipped force_overwrite to default to True. I will do the same in PyPiper. However, we will still need to allow pipestat to offer the ability for history of results:

In the longer term, pipestat should offer the option to include a history of results, and these should be stored somehow in the file (and database). This may not actually be too hard to implement; just add a 'history' function, and when something is overwritten, just move the old values into the history in a way that is an array, rather than a single value. Then, pipestat could offer a clear history function to remove old stuff, if desired, but otherwise, repeated reports of the same result will simply add to the history.