Closed matthewcornell closed 3 years ago
Tagging @nickreich and @elray1 with these questions:
utils.forecast._is_pred_eles_subset_prev_versions()
that does the check? Is t here some other constraint/check we should do instead?My takes at answers:
NULL
values. (And in practice, retractions used to be handled by just omitting rows from csv files, which is different from the new intent here of "omitted rows don't change". so maybe this rule is just to help prevent confusion.)Thanks, Evan. I guess what I'm trying to get it is what is it about truth files (i.e., oracle forecasts) that's intrinsically/semantically different from non-truth forecasts such that relaxing the restrictions makes sense for the former but not the latter. For example, would truth never have retractions?
I think mostly it's just a practical thing. We don't expect real forecasts to be repeatedly updated with large numbers of duplicated rows. but maybe I'm missing a bigger picture thing. I guess from some perspective you can't retract the truth.
i could imagine it might be reasonable to lift this restriction for Zoltar, and make the strict rule a validation that's done on the hub repository? for the csv files in the hub repo, i think we definitely don't want this kind of behavior where a partial file means "leave previous values not represented in this file as they were". but maybe the validation logic for hub files and Zoltar uploads doesn't need to be the same.
@elray1 are you saying that no duplicate detection should be done on the server side? Saving wasted space was a big motivator, right? And trusting the client - hmm.
No, I think we should do the duplicate detection and removal on the server side.
I was saying that it might make sense to not have a different behavior for models and truth data in this respect, so that for both, we would allow "partial updates" that only contain the diff. I don't have a strong opinion about this, but basically wouldn't necessarily rule it out.
Change truth uploads (and maybe be an option for non-truth forecast uploads?) to support uploading only "diffs," i.e., only the retracted/updated/new prediction elements. This involves relaxing this forecast version rule:
This issue is based on @nickreich 's comment: