observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.27k stars 174 forks source link

Error when normalizing a BigInt variable #2154

Open juba opened 2 weeks ago

juba commented 2 weeks ago

When data contains a BigInt variable, trying to apply normalize on this variable throws an error: TypeError: Cannot convert a BigInt value to a number.

I've created a small notebook to reproduce the issue:

https://observablehq.com/@juba/plot-normalize-bigint-error

It seems that the error may come from this line, but I'm far from an expert on this domain so I may be mistaken:

https://github.com/observablehq/plot/blob/0032c11160e6340ce44307675d93bed382d10446/src/transforms/normalize.js#L46C1-L47C1

Fil commented 2 weeks ago

Yes you can fix this example by doing

 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
-      const b = +basis(I, S);
+      const b = Number(basis(I, S));
       for (const i of I) {
-        T[i] = S[i] === null ? NaN : S[i] / b;
+        T[i] = S[i] === null ? NaN : Number(S[i]) / b;
       }
     }
   };

to try, just run yarn prepublishOnly in your repo, then upload the file to your notebook and add a cell that says:

Plot = require(await FileAttachment("plot.umd.min.js").url())

However I don't think this is enough, because we need to fix all the different bases.

In that case the patch is simpler:

 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
+      S = S.map(Number);
       const b = +basis(I, S);
       for (const i of I) {
         T[i] = S[i] === null ? NaN : S[i] / b;

but we should avoid doing this if the values are already numbers (not big).

So maybe instead we do:

--- a/src/options.js
+++ b/src/options.js
@@ -54,7 +54,7 @@ function maybeTypedMap(data, f, type) {
   return map(data, isNumberType(type) ? (d, i) => coerceNumber(f(d, i)) : f, type); // allow conversion from BigInt
 }

-function maybeTypedArrayify(data, type) {
+export function maybeTypedArrayify(data, type) {
   return type === undefined
     ? arrayify(data) // preserve undefined type
     : isArrowVector(data)
diff --git a/src/transforms/normalize.js b/src/transforms/normalize.js
index f4d225cd..2f426020 100644
--- a/src/transforms/normalize.js
+++ b/src/transforms/normalize.js
@@ -1,6 +1,6 @@
 import {extent, deviation, max, mean, median, min, sum} from "d3";
 import {defined} from "../defined.js";
-import {percentile, taker} from "../options.js";
+import {maybeTypedArrayify, percentile, taker} from "../options.js";
 import {mapX, mapY} from "./map.js";

 export function normalizeX(basis, options) {
@@ -43,6 +43,7 @@ export function normalize(basis) {
 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
+      S = maybeTypedArrayify(S, Float64Array);
       const b = +basis(I, S);
       for (const i of I) {
         T[i] = S[i] === null ? NaN : S[i] / b;
Fil commented 2 weeks ago

Also let's create a more meaningful example / test plot. We often get BigInt as the result of a sql count, so maybe simulate this like so:

classes = d3
  .bin()(olympians.map((d) => d.height))
  .map((bin) => ({ weightclass: bin.x0, count: BigInt(bin.length) }))

// default basis
Plot.barY(
  classes,
  Plot.normalizeY({ x: "weightclass", y: "count" })
).plot({ x: { interval: 0.05 } })

// test a different basis
Plot.barY(
  classes,
  Plot.normalizeY("median", { x: "weightclass", y: "count" })
).plot({ x: { interval: 0.05 } })
juba commented 2 weeks ago

I can confirm that the issue seems fixed with your PR at https://github.com/observablehq/plot/pull/2155.

Just as a side note, I get these BigInt values because I'm passing arrow data coming from pandas and polars, which themselves seem to now import CSV files with 64 bits integers by default...

Thanks!

Fil commented 2 weeks ago

thanks. I think I made a mistake in my patch, though. The coercion should happen earlier. Will fix.