Open domoritz opened 5 years ago
I'm not sure if this is possible given our current system design. Our formatting logic is only applied upon load of an external data source, it actually happens outside of dataflow execution. Our derived data source logic expects ingested tuples, not raw data, so the formatting operations wouldn't even apply without additional guidance.
I suppose an alternative might be to add a new top-level abstraction (or data set type) that captures a raw data input, and then allows data definitions to refer to that much like a URL or values source and apply formatting.
I don't have a clear sense of how to proceed at this point. Other thoughts are welcomed!
I see two options. One is to add a new top-level abstraction for data input as you suggested. Another would be a transform to extract tuples. Something like
{
"name": "table",
"values": [
{"foo": {"bar": [
{"category": "A", "amount": 28},
{"category": "B", "amount": 55},
{"category": "C", "amount": 43},
{"category": "D", "amount": 91},
{"category": "E", "amount": 81},
{"category": "F", "amount": 53},
{"category": "G", "amount": 19},
{"category": "H", "amount": 87}
]}}
],
"transform": [
{"type": "property", "field": "foo.bar"}
]
}
But I haven't thought deeply about the details and implications of this transform.
@jheer What do you think about my proposal? This has come up again with a user of Vega-Lite.
One option that is close to but not exactly what you propose above is to use the flatten transform:
{
"name": "table",
"values": [
{"foo": {"bar": [
{"category": "A", "amount": 28},
{"category": "B", "amount": 55},
{"category": "C", "amount": 43},
{"category": "D", "amount": 91},
{"category": "E", "amount": 81},
{"category": "F", "amount": 53},
{"category": "G", "amount": 19},
{"category": "H", "amount": 87}
]}}
],
"transform": [
{"type": "flatten", "field": "foo.bar", "as": "bar"}
]
}
You would then need to use field names like bar.category
and bar.amount
, or follow the flatten with a project operation (though that requires knowing the desired field names):
{"type": "project", "fields": ["bar.category", "bar.amount"], "as": ["category", "amount"]}
I have to play with this idea a bit more. Thanks for the suggestion!
If you want to extract different data from the same JSON object, it would be good if we supported
data.format.property
for all sources.This is needed to resolve https://github.com/vega/vega-lite/issues/5034