lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.73k stars 202 forks source link

support `pyarrow>=16.0` #2596

Closed cjackal closed 3 days ago

cjackal commented 1 month ago

I'm totally new to the lance world so not really sure about the cost, but it seems pylance has rather strong version pinning on pyarrow (pyarrow<15.0.1).

It didn't use to matter for a while but now that numpy 2.0 is out and pyarrow supports it from pyarrow>=16.0.0, there arises a version conflict - there's no working pyarrow version for both numpy>=2.0 and pylance. Another concern is that pyarrow is pretty widespread as a mandatory dependency in ML field, so resolving a version conflict is quite tough with some newer libraries.

wjones127 commented 1 month ago

I think we can do this. There was a major bug (caused a crash) in PyArrow involving tensor columns. But everything else should be useable. We can enable that, and just disable our unit test for that version of PyArrow.