Open d70-t opened 3 years ago
Sorry, wrong comment;)
@d70-t You are right, the use cases are different.
Note that there might be a subtle tension between this use case and #20: If
dataset = get_dataset("<persistent identifier>")
is guaranteed to return exactly the same result whenever it is called, it can not start to issue a warning at any point in time.
A way out might be to have two strictly separated parts of the result of this function call. One part (the actual dataset) could then be guaranteed to never change while the other part (i.e. auxiliary info) could contain something like a changelog or news or announcements etc... which explicitly is allowed to change over time.
That way, if a user decides to only use the dataset part, the user can always reproduce the results. And by looking at the auxiliary part, the user can still be warned about new datasets. Separating those two aspects seems to be key in the design as it is important to know as a user which parts of the response may change and which parts may never change (i.e. can be trusted).
It is possible to think of those two parts as two return values, i.e.:
dataset, auxinfo = get_dataset("<persistent identifier>")
or it is possible to use the syntax with only one return value, but having some kind of warning system as a secondary return channel (e.g. pythons warning facilities)
As user of datasets from the HALO-DB, I want to be informed about new versions of the dataset while using it. In particular, if data is used as described in #20:
and the Dataset identified by the given persistent identifier has been updated to a newer version, I want to be informed about this by a warning which is issued upon the execution of this code.
Note that the code still should run successfully, such that old analyses can be reproduced.
This is probably related to #25, but slightly different. While #25 would inform subscribing users more quickly about a change, this use case would inform the user right when the user is working (again) on the dataset. Additionally, this use case would inform users which did not initially subscribe to that dataset but received a copy of the analysis program.