observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.46k stars 182 forks source link

Mishandling of Arrow's null values #2194

Closed Fil closed 1 month ago

Fil commented 1 month ago

In #2115 we did not consider that the arrow vector.toArray() can return anything when a value is null. (See https://github.com/observablehq/framework/pull/1738 for a concrete consequence).

After calling vector.toArray we have to clear out the null values (that can be skipped if there are no nulls).

mbostock commented 1 month ago

Can you find documentation for this? How are you supposed to get the value out of a vector??

Fil commented 1 month ago

The closest I've found is @domoritz saying that toArray() is “often not generating what people want” (ref).

I've found the code around here but I'm not fully grokking it. https://github.com/apache/arrow/blob/8be5f9c6b41ddd840939602016811aa9c739f0a3/js/src/visitor/iterator.ts#L118

domoritz commented 1 month ago

Can you give me a minimal example with arrow? I suspect we can improve toArray() but what I meany by the comment is that you often simply iterate over the arrow vector or table and treat it like an array.

for (const el of vector) {
     console.log(el);
}
Fil commented 1 month ago
❯ cat v.js
import {arrow}  from "https://deno.land/x/arrow@v8.0.0/mod.ts";
const vector = arrow.tableFromArrays({v: [1, null, 2]}).getChild("v");

console.log(vector.toArray());
console.log([...vector]);
console.log(vector.toArray().map((d, i) => vector.isValid(i) ? d : NaN));
❯ deno v.js
Float64Array(3) [ 1, 0, 2 ]
[ 1, null, 2 ]
Float64Array(3) [ 1, NaN, 2 ]

In my real use case, the 0 in the middle was a "random" float32 derived from dirty memory, which made a different chart on each invocation — but the crucial point is that toArray() does not restitute nulls (and so we have a bug in Plot, because we didn't consider this behavior).