observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
957 stars 83 forks source link

Add file.csv({typed: "auto"}) option that uses inferSchema #360

Closed tophtucker closed 1 year ago

tophtucker commented 1 year ago

Since 2020, file.csv and file.tsv have allowed you to pass {typed: true} to apply d3.autoType, which looks line by line and tries to guess the types. More recently, the data table cell has introduced an internal inferSchema function, which instead looks at a sample of rows and picks the most frequent type.

This introduces a new {typed: "auto"} option, which calls inferSchema and then coerces all rows to that schema, allowing FileAttachment calls to match the behavior of the data table cell. It also uses that new "auto" option in loadChartDataSource.

This duplicates a couple lines of the __table function…

https://github.com/observablehq/stdlib/blob/89f27007453ea1919493dca3fb94b7a5bde03222/src/table.js#L621

https://github.com/observablehq/stdlib/blob/89f27007453ea1919493dca3fb94b7a5bde03222/src/table.js#L630

…but, since that has some other logic for overriding types, I didn’t bother refactoring anything for now.

Note that, since the existing code only checks for the truthiness of typed, and “auto” is truthy, this could theoretically change the behavior of existing code, if anyone has somehow already said typed: "auto". But I think that’d be very rare!!

tophtucker commented 1 year ago

Thanks for the comments Mike! Mostly addressed I think; maybe a couple things left to do…

Since inferSchema is exported, I'll leave the getAllKeys behavior when no columns property is specified, even though it seems it's not used anywhere inside stdlib.

tophtucker commented 1 year ago

😅 🙏 😌

Reverted adding the defaults! And it now fixes https://github.com/observablehq/observablehq/issues/11103:

image