Closed domoritz closed 9 years ago
Oh now I get what you mean. So when we use value:
, we eliminate name:
for that field, right?
If yes, I think that's an interesting idea. A couple of interesting decision to make.
value:
then?d.
in the value? Yes, we would get rid of name
. The issue with d.
would not be a problem in vega 2 but not vega 1, which is a bit annoying. I was thinking only about the ideal API and not so much the implementation at this point.
As an intermediate step, we could say that we only support simple function calls and maybe ratios and write a simple regex parser for it. However, I'm not really sure about all the implications of my suggestion yet.
It's worth noting that T
(temporal/Date
) and Q
(quant/number
) should support different set of operations.
For T, most of the time, you don't want to support complex calculation. (even +
doesn't make much sense) You just want to abstract the time value. And we need to know function name to predict cardinality, etc. (This might change a lot with vega2)
For Q, it's more free form and most of the time, we don't care much about cardinality (unless you cast it to be O).
But yes, from the user perspective, the distinction might make things more complicated.
Another thought.
Is it kinda weird that aggregation
are not included in the expression expr
?
For example,
{'aggregation':'min', value:'abs(foo)'}
I also see how this would make it harder for polestar because you cannot have a dropdown for all available functions.
Is it kinda weird that aggregation are not included in the expression expr?
I thought about this but tbh, what sql does is super confusing. I think separating aggregations and scalar functions makes sense. Also, if you want to reason about behavior, it's good to have them separate.
I also see how this would make it harder for polestar because you cannot have a dropdown for all available functions.
Starting from UI, I was thinking about augmenting derive
to .data
, which is more like Tableau's model where you can add additional custom field to the data
and name them, which makes the encoding
clean as the encoding
only refers to name. (And maybe common derive function such as time abstraction.)
By doing this, we somewhat decouple data
manipulation from encoding (except the final aggregation/group-by).
I think of scalar functions as simple mappings that don't have huge effects. I see how that is not necessarily true because of cardinality estimation but I still don't think this justifies the model of derived fields.
The model of derived field is not justified by cardinality. It's more for supporting UI and its common data manager, but maybe inferior for developers.
That said, if we choose to use value
, we just to do the same thing in vlui to manage the naming but makes writing vega-life by hand easier.
That said, if we choose to use value, we just to do the same thing in vlui to manage the naming but makes writing vega-life by hand easier.
What do you mean by "manage the naming"?
I mean for applications like Polestar and Voyager. When user create a new derived variables, you need to name them to refer to them.
However, we implement value
model in Vega-lite, this step is skipped as we don't need names in the spec, we just created those derived names on the fly and use anonymous names.
@domoritz I discuss this with @jheer and we plan to do the derive
model as vega-lite’s goal is more for supporting programmatic generation. That said, the derive
model isn’t particularly painful for handwritten code anyway.
For fn
, we will name it as timeUnit
to be consistent with datalib
.
Let’s add a property to data
formula: [
{field: <field_name>, expr: <Vega expression>}
]
I don't know why you prefer the derive model. Can you elaborate?
@domoritz For the record, in #631 you state that
One of the goals of vl is that it can be generated automatically and that it can be supported by other languages (e.g. python). If we have expressions in fields in the encDef, this would be much harder. If we keep it in the data, we can say that we only support formula transforms in js. In other languages, you would do the transformations beforehand.
So I guess you now agree with the decision to go with "derive" model first and we can consider if we want to add the expression model later.
The “derive” model implemented as data.formula
in #631 — the only issue is that stats needs to be provided — but we plan to eliminate that in #648. Therefore this issue can be closed.
The syntax for expressions is odd at the moment. There are multiple options to refactor
currently:
rename fn to
function
oras
rename to
map
,accessor
and allow more functions.or change how fields are mapped to encodings entirely. This also allows other expressions such as rations and such.
See discussion in #447