observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
961 stars 83 forks source link

Add Apache Arrow as a recommended library and supported file attachment #223

Closed mbostock closed 3 years ago

mbostock commented 3 years ago

Apache Arrow is exposed as Arrow in the standard library, and fileAttachment.arrow() returns a Promise to an Arrow.Table.

Screen Shot 2021-06-02 at 3 56 19 PM
visnup commented 3 years ago

We're effectively pinned to these library versions, at least major if not minor versions, until we can version the standard library, correct? It feels like they are making steady, significant progress on this library and it's gone from v1 to v4 in the past year. All of the other libraries we've recently added I assumed were pretty stable, but I'm half worried Arrow could go from 4 to 7 before we can follow suit.

mbostock commented 3 years ago

We’re committed to backwards compatibility until we ship version pinning, yes. I don’t think we should block adding useful functionality on us shipping version pinning: I’d rather include a slightly out-of-date version of Apache Arrow in the box than nothing.

mbostock commented 3 years ago

Side note, but it does look like 5.0.0 is already planned per the package.json, but I can’t find any release notes, so I’m not really sure what’s different. In any case, I think we should still go ahead, but also redouble our efforts to ship version pinning.

domoritz commented 3 years ago

All Arrow packages are released every three months and every time there is a new major version. Note that the binary format is not changing. See https://arrow.apache.org/docs/format/Versioning.html for details.

In the past few versions, the JS library hasn't changed much but for v5, we started some significant improvements to make the library leaner and more tree-shakeable. One breaking change we already added is https://github.com/apache/arrow/pull/10277 (which you can work around easily by returning a DataFrame) and another significant change will be https://github.com/apache/arrow/pull/10371.

mbostock commented 3 years ago

Arrow 5.0.0 is out already, but since we plan on adding Arquero imminently, I figure we should stick to 4.0.1.

domoritz commented 3 years ago

The api hasn't changed much between 4 and 5. The biggest change is that tables don't extend data frame anymore (but data frames still extend tables).

visnup commented 3 years ago

Is there a reason we can't do 5 then? Would it be incompatible with the current version Arquero?

domoritz commented 3 years ago

I don't think so but it would be good to confirm by updating arquero to v5.

mbostock commented 3 years ago

I will investigate upgrading to Arrow 5 at the time we add Arquero.