Derived columns - Githubissues

Part of https://github.com/observablehq/observablehq/issues/11623

Description

This PR adds support for derived columns in the __table function. In this first version, we won't support derived columns for database sources.

Notable changes:

Pulled the logic for inferring types and coercing rows into a separate applyTypes function. We now run this twice, first on the source dataset, and second on the derived dataset, before merging the two. Type inference/coercion must be done separately on the derived dataset because it may depend on values in the source dataset already being coerced.
Added a .fullSchema property to the return value of __table. This property contains the schema information for all columns in the dataset, regardless of whether or not they are selected (.schema only contains the schema info for selected columns). In https://github.com/observablehq/observablehq/pull/11214, we switch to using the cell value to get the table schema, because derived columns aren't available on the original dataset and their types may be dynamic, so we need to look at their evaluated runtime values. We look at .fullSchema when fetching the table schema so that we always have type information for the full set of columns, so that users can e.g. reselect a deselected column in the Columns menu.

Review notes

I would love some feedback on the .fullSchema change! I'm not sure if it's the best way to make all the column types available, and I'm also not sure if I'm missing any major pitfalls with switching to use the cell value to fetch the table schema, instead of looking at the original data source as we do today. The main pitfall I experienced when testing is that, if there's an error in a derived formula, we no longer have a table schema available because the cell throws an error. I addressed that in the monorepo PR by adding a fallback that goes back to using the loaded data source + an approximation of the derived columns schema, which I think is the best we can do in that case. But perhaps there are other issues with this approach that I'm missing...

observablehq / stdlib

Derived columns #367

Description

Review notes