vega / vega

A visualization grammar.
https://vega.github.io/vega
BSD 3-Clause "New" or "Revised" License
11.21k stars 1.5k forks source link

Support `data.format.property` not just in source datasets #1876

Open domoritz opened 5 years ago

domoritz commented 5 years ago

If you want to extract different data from the same JSON object, it would be good if we supported data.format.property for all sources.

This is needed to resolve https://github.com/vega/vega-lite/issues/5034

jheer commented 5 years ago

I'm not sure if this is possible given our current system design. Our formatting logic is only applied upon load of an external data source, it actually happens outside of dataflow execution. Our derived data source logic expects ingested tuples, not raw data, so the formatting operations wouldn't even apply without additional guidance.

I suppose an alternative might be to add a new top-level abstraction (or data set type) that captures a raw data input, and then allows data definitions to refer to that much like a URL or values source and apply formatting.

I don't have a clear sense of how to proceed at this point. Other thoughts are welcomed!

domoritz commented 5 years ago

I see two options. One is to add a new top-level abstraction for data input as you suggested. Another would be a transform to extract tuples. Something like

{
  "name": "table",
  "values": [
    {"foo": {"bar": [
      {"category": "A", "amount": 28},
      {"category": "B", "amount": 55},
      {"category": "C", "amount": 43},
      {"category": "D", "amount": 91},
      {"category": "E", "amount": 81},
      {"category": "F", "amount": 53},
      {"category": "G", "amount": 19},
      {"category": "H", "amount": 87}
    ]}}
  ],
  "transform": [
    {"type": "property", "field": "foo.bar"}
  ]
}

But I haven't thought deeply about the details and implications of this transform.

domoritz commented 5 years ago

@jheer What do you think about my proposal? This has come up again with a user of Vega-Lite.

jheer commented 4 years ago

One option that is close to but not exactly what you propose above is to use the flatten transform:

{
  "name": "table",
  "values": [
    {"foo": {"bar": [
      {"category": "A", "amount": 28},
      {"category": "B", "amount": 55},
      {"category": "C", "amount": 43},
      {"category": "D", "amount": 91},
      {"category": "E", "amount": 81},
      {"category": "F", "amount": 53},
      {"category": "G", "amount": 19},
      {"category": "H", "amount": 87}
    ]}}
  ],
  "transform": [
    {"type": "flatten", "field": "foo.bar", "as": "bar"}
  ]
}

You would then need to use field names like bar.category and bar.amount, or follow the flatten with a project operation (though that requires knowing the desired field names):

{"type": "project", "fields": ["bar.category", "bar.amount"], "as": ["category", "amount"]}
domoritz commented 4 years ago

I have to play with this idea a bit more. Thanks for the suggestion!