Closed elray1 closed 1 year ago
So while I'm not super excited by the idea of implementing such a change, I'm sympathetic to the motivation.
Personally I find output_value_metadata
a bit too long and a little vague.
From the start I thought something like output type attribute (output_type_attr
) might be a good option.
It's interesting also that the suggestion involves value
and not output_type
. Indeed I noticed that, in Italian, output_type_id
had been translated as value_id
(id_valore
), which also made me consider whether output type ID is indeed a property of the output type or whether it should be considered a property/attribute of the value it relates to.
In this case we could have value_attr
.
I'm also sympathetic to the criticism of the output_type_id
name, and I'm open to changing it.
I don't love the proposed names involving value
for two reasons:
output_type = "sample"
: sample index, output_type = "quantile"
: quantile probability level, output_type = "pmf"
or "cdf"
: bin label or target variable value. These things are framed as specifications of a detail about or refinements to the output_type
.value
within the row, I think they are doing so in essentially the same way that values of the task id variables are. In some sense, the value
is the model's "solution" to the prediction task specified by all of the other columns. Names like value_attr
, value_id
, and output_value_metadata
all feel very generic and like they could equally well describe any of the columns. I think we want something here that gets more specifically at how we're using this column in particular.Makes sense. How do you feel about output_type_attr
?
output_type_attr
is ok by me
Same as Anna, I'm not super excited by the idea of implementing this change, I kind of understand the motivation.
Personally, I have no issue with output_type_id
, and I am ok with output_type_attr
.
I don't think there is a "perfect" column name and the name we choose might still be confusing for some users. A maybe unrealistic example just to illustrate my last phrase: attr
in R can be understood as a way of tagging additional information, like metadata, which is not really the case here.
continuing to brainstorm somewhat unsatisfying options: output_type_level
?
We decided to keep output_type_id
, acknowledging that it is not perfect
Dylan noted that this is a potentially confusing name for this column, as someone who is familiar with ideas about relational databases who is coming to the hubverse for the first time would interpret this as being the unique identifier for the output type (e.g. "quantile" = 1, "sample" = 2, etc). He suggests perhaps something like "output_value_metadata".