FeatherStore is high performance datastore for storing Pandas DataFrames, Polars DataFrames, and PyArrow Tables. By saving data in the form of partitioned Feather Files, FeatherStore enables several operations on the stored tables, optimizing performance by selectively loading only the necessary segments of data:
For more information on using FeatherStore, please refer to the documentation.
>>> # Create a Pandas DataFrame
import pandas as pd
from numpy.random import randn
import featherstore as fs
dates = pd.date_range("2021-01-01", periods=5)
df = pd.DataFrame(randn(5, 4), index=dates, columns=list("ABCD"))
A B C D
2021-01-01 0.402138 -0.016436 -0.565256 0.520086
2021-01-02 -1.071026 -0.326358 -0.692681 1.188319
2021-01-03 0.777777 -0.665146 1.017527 -0.064830
2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509
2021-01-05 -0.649335 -0.830602 1.191749 0.396745
>>> # Create a database folder at the given path
fs.create_database('path/to/db')
fs.connect('path/to/db')
# Creates a data store
fs.create_store('example_store')
# List existing stores in current database
fs.list_stores()
['example_store']
>>> # Connects to store
store = fs.Store('example_store')
# Saves table to store; partition size defines the size of each partition in bytes
PARTITION_SIZE = 128 # bytes
store.write_table('example_table', df, partition_size=PARTITION_SIZE)
# Lists existing tables in current store
store.list_tables()
['example_table']
>>> # FeatherStore can read tables as Arrow Tables, Pandas DataFrames or Polars DataFrames
store.read_pandas('example_table')
# store.read_arrow('example_table') for reading to Arrow Tables
# store.read_polars('example_table') for reading to Polars DataFrames
A B C D
2021-01-01 0.402138 -0.016436 -0.565256 0.520086
2021-01-02 -1.071026 -0.326358 -0.692681 1.188319
2021-01-03 0.777777 -0.665146 1.017527 -0.064830
2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509
2021-01-05 -0.649335 -0.830602 1.191749 0.396745
>>> # FeatherStore supports appending data without loading in the full table
new_dates = pd.date_range("2021-01-06", periods=1)
df1 = pd.DataFrame(randn(1, 4), index=new_dates, columns=list("ABCD"))
store.append_table('example_table', df1)
# It also supports querying parts of the data
store.read_pandas('example_table', rows={'after': '2021-01-05'}, cols=['D', 'A'])
D A
2021-01-05 0.396745 -0.649335
2021-01-06 0.606950 0.408125
FeatherStore is very fast, and in fact is one of the best performing solutions available. See the performance benchmark here.
FeatherStore can be installed by using $ pip install featherstore
or directly from
source by using $ pip install git+https://github.com/hakonmh/featherstore.git
Want to know about all the features FeatherStore support? Read the docs!