Closed elray1 closed 1 year ago
DECISION: Option 4 🎉
I've been working on implementing some of the decisions we made within the schema, in particular with respect to type_id
for the mean
and median
output types. I've taken the opportunity to include more detail in the description as, required and optional are a little bit of an awkward concept in the context of mean and median type id. See what you think!
I've created two branches:
required
and optional
in the standard way we have been specifying them in other properties.
required
or optional
and the other should be set to null, I've encoded this restriction in a oneOf
statement instead.
Let me know which implementation you prefer.
I don't have very strong feelings about this -- but maybe a weak preference for the first option because it's more consistent with the other specifictions, and doesn't use the oneOf construction, which I guess maybe fewer people would be familiar with? But I would be happy to accept your recommendation for what you prefer, if you like the other one.
I have the same response as Evan.
Thanks both! So the reason I prefer the oneOf
specification is that it can check during schema validation that the combined values required
and optional
are valid so we don't end up in an ambiguous situations where, for example, both have been set to ['NA']
or null.
Having said that, if both are ['NA']
, required
could take precedent. It's just not very clean.
Now that I think of it though, is there a situation where in a given set of task_ids or rounds, one of the output types needs to be specified because it has been included in another round but should not be submitted, in which case null
& null
should be set for required and optional?
Thanks -- that makes sense.
Is this situation of possibly-repeated values across the optional
and required
fields a more general thing that we need to address in either case? e.g., what if someone specifies "US"
as both an optional
location and a required
location? We might want to either check for that as part of validating the tasks.json
file, or document that in that case, the field will be required
in practice.
For your last point -- in that case, I think that each round (and each task group within that round) only needs to include the output types that are required for that round (or task group). The output type column will still be included, we're just specifying which values of output types are required or optional within that column.
Currently, the required and optional values of output type ids can in effect also specify whether the corresponding output types as a whole are required: namely, a particular output type is required if it has at least one required
type_id
, and is optional otherwise. This may be confusing. Is there another way? See also the related discussion under issue #9.Current proposed system
To explain the situation, we consider a series examples of hubs with varying modeling task specifications.
Example 1:
For a hub with this specification, a valid submission must include at least the following rows, obtained via a kind of
expand_grid
action across the different combinations of required values for the task id variables and requiredtype_id
s within each output type. Note that in this process, you could imagine first concatenating theoutput_type
s with the options fortype_id
values within eachoutput_type
, so that they are treated as a "unit" when theexpand_grid
happens. Then split them back into two columns. This is necessary to track the nesting oftype_id
values withing the specific output types.Example 2
Example 2 is the same as example 1, but it has only one required quantile level:
For a hub with this specification, a valid submission must include at least the following rows:
Example 3
Example 3 is similar to examples 1 and 2, but now all of the quantile levels are specified as optional.
For a hub with this specification, a valid submission must include at least the following rows:
Example 4
Our final example is similar to example 1, but swaps the specification of
["NA"]
andnull
values in therequired
andoptional
fields for themean
output type:For a hub with this specification, a valid submission must include at least the following rows:
Summary and question for discussion
Summary: Under the current system, the required rows that a submission must minimally obtain are obtained by applying an
expand_grid
type of action to the task id variables and combinations of output types and type ids. This means that if there are no required values under the type_ids for a particular output type, a minimal submission does not need to include any rows with that output type. Effectively, this means that that output type is optional. Saying this again in different words: in this set up, a particular output type is required only if there is at least one value specified asrequired
in the type_ids under that output type. This is illustrated in examples 3 and 4 above.Every time this has come up, this use of required/optional values of a type_id to implicitly set the status of an output type has been non-intuitive. How can we resolve this? Three ideas:
output
column so that it has therequired
andoptional
properties similar to the other columns. We would then perhaps check that the names of any additional properties currently under"output_types"
match the values that were specified asrequired
oroptional
for theoutput
column. We would need to think through and document how this interacts with the "implicit requirement" for output types that comes out of the current procedure as illustrated above.output_type
andtype_id
(and any restrictions onvalue
) as beingrequired
oroptional
.