This library currently does not have a stable release. The API and implementation may change. Future changes may not be backward compatible.
TorchArrow is a torch.Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:
You will need Python 3.7 or later. Also, we highly recommend installing an Miniconda environment.
First, set up an environment. If you are using conda, create a conda environment:
conda create --name torcharrow python=3.7
conda activate torcharrow
The following is the corresponding torcharrow
versions and supported Python versions.
torch |
torcharrow |
python |
---|---|---|
main / nightly |
main / nightly |
>=3.7 , <=3.10 |
1.13.0 |
0.2.0 |
>=3.7 , <=3.10 |
Follow the instructions in this Colab notebook
Experimental nightly binary on macOS (requires macOS SDK >= 10.15) and Linux (requires glibc >= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:
pip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
If you are installing from source, you will need Python 3.7 or later and a C++17 compiler.
git clone --recursive https://github.com/pytorch/torcharrow
cd torcharrow
# if you are updating an existing checkout
git submodule sync --recursive
git submodule update --init --recursive
On macOS
HomeBrew is required to install development tools on macOS.
# Install dependencies from Brew
brew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent
# Build and install other dependencies
scripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2
On Ubuntu (20.04 or later)
# Install dependencies from APT
apt install -y g++ cmake ccache ninja-build checkinstall \
libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev \
libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev
# Build and install folly and fmt
scripts/setup-ubuntu.sh
For local development, you can build with debug mode:
DEBUG=1 python setup.py develop
And run unit tests with
python -m unittest -v
To build and install TorchArrow with release mode:
python setup.py install
TorchArrow is BSD licensed, as found in the LICENSE file.