pytorch / torcharrow

High performance model preprocessing library on PyTorch
https://pytorch.org/torcharrow/beta/index.html
BSD 3-Clause "New" or "Revised" License
647 stars 79 forks source link
preprocessing python pytorch

TorchArrow: a data processing library for PyTorch

This library currently does not have a stable release. The API and implementation may change. Future changes may not be backward compatible.

TorchArrow is a torch.Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:

Installation

You will need Python 3.7 or later. Also, we highly recommend installing an Miniconda environment.

First, set up an environment. If you are using conda, create a conda environment:

conda create --name torcharrow python=3.7
conda activate torcharrow

Version Compatibility

The following is the corresponding torcharrow versions and supported Python versions.

torch torcharrow python
main / nightly main / nightly >=3.7, <=3.10
1.13.0 0.2.0 >=3.7, <=3.10

Colab

Follow the instructions in this Colab notebook

Nightly Binaries

Experimental nightly binary on macOS (requires macOS SDK >= 10.15) and Linux (requires glibc >= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:

pip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

From Source

If you are installing from source, you will need Python 3.7 or later and a C++17 compiler.

Get the TorchArrow Source

git clone --recursive https://github.com/pytorch/torcharrow
cd torcharrow
# if you are updating an existing checkout
git submodule sync --recursive
git submodule update --init --recursive

Install Dependencies

On macOS

HomeBrew is required to install development tools on macOS.

# Install dependencies from Brew
brew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent

# Build and install other dependencies
scripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2

On Ubuntu (20.04 or later)

# Install dependencies from APT
apt install -y g++ cmake ccache ninja-build checkinstall \
    libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev \
    libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev
# Build and install folly and fmt
scripts/setup-ubuntu.sh

Install TorchArrow

For local development, you can build with debug mode:

DEBUG=1 python setup.py develop

And run unit tests with

python -m unittest -v

To build and install TorchArrow with release mode:

python setup.py install

License

TorchArrow is BSD licensed, as found in the LICENSE file.