Closed libbey-observable closed 1 year ago
In the (soon to be outdated) schema inference in computeSummaries
, we iterate through all of the rows in the dataset to ensure that we get all the keys:
// We need to show *all* keys present in the array of Objects
function getAllKeys(rows) {
const keys = new Set();
for (const row of rows) {
// avoid crash if row is null or undefined
if (row) {
// only enumerable properties
for (const key in row) {
// only own properties
if (Object.prototype.hasOwnProperty.call(row, key)) {
// unique properties, in the order they appear
keys.add(key);
}
}
}
}
return Array.from(keys);
}
I don't see a comparable replacement for that in this PR - are we foregoing that step moving forward (and if so, how will we handle columns that don't get typed because they aren't present in the sample...)?
In the (soon to be outdated) schema inference in
computeSummaries
, we iterate through all of the rows in the dataset to ensure that we get all the keys: I don't see a comparable replacement for that in this PR - are we foregoing that step moving forward
Added, thanks @mkfreeman!
We'll need to handle the case when a user asserts that an array of primitives is a type that, when coerced, returns all null
values. For example, if the data is d3.range(10)
, and the user selected the type boolean
.
This will currently break because arrayIsPrimitive
will return false
, and the worker won't convert the primitive values into objects:
// This code path will not be entered in the case described above
let value = cellValue;
if (arrayIsPrimitive(cellValue)) { // not true if an array of numbers is set to boolean
value = Array.from(cellValue, (d) => ({value: d}));
value.schema = cellValue.schema;
}
I suggest we handle this at the stdlib level my adding the following code:
export function arrayIsPrimitive(value) {
return (
isTypedArray(value) ||
arrayContainsPrimitives(value) ||
arrayContainsDates(value) ||
arrayContainsNulls(value) // checks if all values are null
);
}
// Given an array, checks that the first n elements are all null or undefined.
// This allows an array of all null or undefined values to be handled in the
// table cell, such as if a user asserts a type that coerces all array values to
// null
export function arrayContainsNulls(value) {
const n = Math.min(nChecks, value.length);
if (!(n > 0)) return false;
return value.every(element => element == null)
}
We'll need to handle the case when a user asserts that an array of primitives is a type that, when coerced, returns all
null
values. For example, if the data isd3.range(10)
, and the user selected the typeboolean
.This will currently break because
arrayIsPrimitive
will returnfalse
, and the worker won't convert the primitive values into objects:
Thanks for raising this, @mkfreeman! I see what you mean about arrays of null
values. If we want to treat those as arrays as primitives, we may want to stop ignoring them in the arrayIsPrimitive
function, maybe something like what's shown below. But I'd be curious to hear @mbostock's thoughts on whether we do want to treat them that way?
I’ve been wondering if we should drop support for arrays of primitives since it seems to be adding a lot of complexity…
I’ve been wondering if we should drop support for arrays of primitives since it seems to be adding a lot of complexity…
As a DTC user, I feel strongly that we should continue supporting arrays of primitives, as it allows a quick visual summary of arrays (of primitives...).
In terms of your proposed solution @libbey-observable, I think we need arrayContainsPrimitives
to return true if the values are a mix of a primitive type and null/undefined. We could support that via a different function (my suggestion above), or I think we'd need to tweak the switch
statement to add a new case for undefined
(or the case when the type is object
for null
values), which feels a little tricky because we don't want to set the type to null
unless they are all null
.
(related to primitive arrays but sorry if this is too tangential for this PR) i was wondering why we coerce primitive arrays to {value}
in the worker, if we're already doing it here? i see it in both requestTableSummary and requestTableRows. is that code redundant, or does it cover some other case?
i was wondering why we coerce primitive arrays to
{value}
in the worker, if we're already doing it here? i see it in both requestTableSummary and requestTableRows. is that code redundant, or does it cover some other case?
Thanks, @annie! I was actually just wondering the same thing. When I try removing those steps from worker/table.js, at least with the stdlib changes here (where an array of primitives gets promoted to objects on load), I'm not seeing any issues. I think we may be able to remove those, but would maybe wait until these stdlib changes are released, if we do go this route. And would test more.
Resolves https://github.com/observablehq/observablehq/issues/9862 and https://github.com/observablehq/observablehq/issues/9673
In this inference approach, we take a sample (currently the first 100 rows), and for each column, count how many times we encounter each possible data type. Then, if > 90% of the sampled values conform to that type, we take the most frequently encountered type as the column's type. If not, we use type "other."
These changes include some support for upcoming column type assertions, but for the purposes of this PR, I'm testing against main.
https://user-images.githubusercontent.com/111310561/214719295-8326f325-879b-4b8a-9d9f-2397b28951de.mov