WARNING! This package is no longer maintained nor supported. Please transition to the currently maintained Python SDK for Pachyderm: pachyderm-sdk
Official Python Pachyderm client.
This library provides the autogenerated gRPC/protobuf code for Pachyderm, along with a higher-level and more pythonic Client
class.
See the API docs.
pip install python-pachyderm
Here's an example that creates a repo and adds a file:
import python_pachyderm
# Connects to a pachyderm cluster on localhost:30650.
# For other options, see the API docs.
client = python_pachyderm.Client()
# Create a pachyderm repo called `test`
client.create_repo("test")
# Create a file in `(repo="test", branch="master")` at `/dir_a/data.txt`
# Similar to `pachctl put file test@master:/dir_a/data.txt`
with client.commit("test", "master") as commit:
client.put_file_bytes(commit, "/dir_a/data.txt", b"DATA")
# Get back the file
f = client.get_file(("test", "master"), "/dir_a/data.txt")
print(f.read()) # >>> b"DATA"
How to load a CSV file into a Pandas dataframe
import pandas as pd
f = client.get_file(("my_repo", "my_branch"), "/path_to/my_data.csv")
df = pd.read_csv(f)
For more sophisticated examples, see the examples directory.
Prior to python-pachyderm 2.0, this library's versioning synced with pachyderm's core versioning; e.g. version 1.8.5 of this library synced with 1.8.5 of pachyderm core. python-pachyderm 2.0 onwards uses semver instead, so versions are not tied to pachyderm core. This was done for two reasons:
However, if for whatever reason you need to know which version of pachyderm core a version of python-pachyderm was built with, consult CHANGELOG.md
. As a broad rule of thumb, we recommend working with the latest version of both pachyderm core and python-pachyderm where possible.
This driver is co-maintained by Pachyderm and the community. If you're looking to contribute to the project, this is a fantastic place to get involved. Take a look at the contributing guide for more info (including testing instructions).