observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
957 stars 83 forks source link

Load tabular data files for chart cells #355

Closed lileeyuh closed 1 year ago

lileeyuh commented 1 year ago

This resolves an error where chart cells were fetching the table schema for a CSV, TSV, JSON, or SQLite file and receiving no columns back.

mbostock commented 1 year ago

In case this helps: https://www.caniemail.com/search/?s=align-items

lileeyuh commented 1 year ago

In case this helps: https://www.caniemail.com/search/?s=align-items

@mbostock was this meant for a different github issue?

mbostock commented 1 year ago

Oops, yes.

mbostock commented 1 year ago

Per the discussion in the other PR, I fear I may have approved this too hastily. I think it is dangerous to repurpose loadTableDataSource for chart cells given that chart cells don’t use this routine to load data when they run—it means that the behavior of how we load data to infer the schema for a chart cell’s data can be inconsistent with how we load data when the chart cell itself runs. I think it would be safer instead to have a dedicated loadChartDataSource routine for chart cells, and I apologize for not realizing the nuances her earlier when I was reviewing.

lileeyuh commented 1 year ago

Could you explain your reasoning for it being dangerous for chart & table to share loadTableDataSource? I thought that loadTableDataSource is simply parsing tabular data depending on what the file type is, which I'm assuming we'd want to stay consistent across data table cell and chart cell

lileeyuh commented 1 year ago

follow-up in other PR: https://github.com/observablehq/observablehq/pull/10840#discussion_r1124878418

mbostock commented 1 year ago

The reason it is dangerous (in the sense of likely to exhibit a bug or inconsistent behavior, but not in the security/privacy sense) is that loadTableDataSource is used by table cells to load the data, whereas for a chart cell, it’s not. And hence if we use loadTableDataSource for a chart cell to infer the schema, that behavior could be inconsistent with what actually happens at runtime.

Case in point, loadTableDataSource doesn’t set {typed: true} when loading CSV and TSV files, and hence every column will be considered a string by the chart cell, whereas the compiler generates {typed: true} such that types are inferred on a per-value basis. Chart cells should show the correct inferred types of columns in CSV, TSV, and JSON files. (I filed a related bug in the other repo in that chart cell should be using the same inferSchema function that data table cell does.)

Data table cells are compiled to a call to __query which calls loadTableDataSource here:

https://github.com/observablehq/stdlib/blob/89f27007453ea1919493dca3fb94b7a5bde03222/src/table.js#L155

Whereas for chart cells the compiler passes e.g. await FileAttachment("foo.csv").csv({typed: true}) to Plot.auto directly, and hence doesn’t go through loadTableDataSource. Hence, we should have a separate loadChartDataSource function that matches the behavior of the chart cell compiler so that we infer the schema types consistently.