Closed annie closed 1 year ago
just pushed up a new commit that catches runtime errors and returns them in the function output (although it seems like i need to update tests!). will put up a corresponding monorepo PR shortly!
update: monorepo PR here
Part of https://github.com/observablehq/observablehq/issues/11623
Description
This PR adds support for derived columns in the
__table
function. In this first version, we won't support derived columns for database sources.Notable changes:
applyTypes
function. We now run this twice, first on the source dataset, and second on the derived dataset, before merging the two. Type inference/coercion must be done separately on the derived dataset because it may depend on values in the source dataset already being coerced..fullSchema
property to the return value of__table
. This property contains the schema information for all columns in the dataset, regardless of whether or not they are selected (.schema
only contains the schema info for selected columns). In https://github.com/observablehq/observablehq/pull/11214, we switch to using the cell value to get the table schema, because derived columns aren't available on the original dataset and their types may be dynamic, so we need to look at their evaluated runtime values. We look at.fullSchema
when fetching the table schema so that we always have type information for the full set of columns, so that users can e.g. reselect a deselected column in the Columns menu.Review notes
I would love some feedback on the
.fullSchema
change! I'm not sure if it's the best way to make all the column types available, and I'm also not sure if I'm missing any major pitfalls with switching to use the cell value to fetch the table schema, instead of looking at the original data source as we do today. The main pitfall I experienced when testing is that, if there's an error in a derived formula, we no longer have a table schema available because the cell throws an error. I addressed that in the monorepo PR by adding a fallback that goes back to using the loaded data source + an approximation of the derived columns schema, which I think is the best we can do in that case. But perhaps there are other issues with this approach that I'm missing...