single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
https://tiledbsoma.readthedocs.io
MIT License
90 stars 25 forks source link

[python] Update pyarrow dependency #1926

Closed johnkerl closed 1 month ago

johnkerl commented 11 months ago

This is for the pyarrow CVE. @bkmartinjr has verified we don't use the vulnerable code path, but, it's good optics for us to update.

johnkerl commented 11 months ago

Here are the issues:

Here is my proposal:

cc @thetorpedodog @bkmartinjr @jdblischak @Shelnutt2

thetorpedodog commented 11 months ago

sounds good. there also is no rush to remove the hotfix; if pyarrow is already secure it does nothing, so in the worst case it’s an extra import and some light processing.

bkmartinjr commented 11 months ago

Proposal works for me.

Alt, which you are free to ignore: given what I've read about the actual fix (i.e., in code we don't use), it would also be OK to:

ihnorton commented 11 months ago

+1 from me

johnkerl commented 10 months ago

Additional non-joy: we found on #1936 that with pyarrow >= 13.0 we have

import pyarrow
import tiledb
tiledb.open('s3://anything/at/all')
Fatal error condition occurred in /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/aws-c-common/src/v0.8.9-fed0b55d0f.clean/source/allocator.c:121: allocator != ((void*)0)

on MacOS either x86_64 or arm64.

This should be followed up on.

Stack trace: https://gist.github.com/johnkerl/eb5874e94d0cc4768114faadcb989e83

johnkerl commented 10 months ago

This should be followed up on.

Here is the follow-up:

johnkerl commented 10 months ago

Given the above, namely:

I conclude that we'll need to simply stick with pyarrow_hotfix long-term.

All relevant PRs on this repo have been implemented and merged.

johnkerl commented 10 months ago

Update: there is a mitigation in aws-sdk-cpp 1.11.179 (core currently uses 1.11.160)

so we can handle this, but, only after a core bump.

johnkerl commented 10 months ago

No longer blocks 1.6

ivirshup commented 4 months ago

pyarrow >= 13.0 on MacOS results in a fatal error with import pyarrow and import tiledb

I can't reproduce this behavior from pyarrow>=14.0.2.

Nvm it's dependent on import order. So importing tiledb before pyarrow is fine, but the reverse errors.

In light of that, could upper version bound here get removed? There are no wheel available for pyarrow versions this old for python 3.12, which is giving some CI grief over on cellxgene_census: https://github.com/chanzuckerberg/cellxgene-census/actions/runs/9405819588/job/25907864054?pr=1189

johnkerl commented 4 months ago

Please keep this issue open until these are resolved:

johnkerl commented 1 month ago

Resolved as of https://github.com/single-cell-data/TileDB-SOMA/releases/tag/1.14.0.