Kadro means frame in esperanto.
Kadro is a small python package that wraps a little bit of extra functionality around pandas. The goal is to add more functional methods such that you can use pandas in a more composable manner. For example, you may do queries like;
import numpy as np
import pandas as pd
import kadro as kd
df = pd.read_csv(<some_file>)
(kd.Frame(df)
.mutate(e = lambda _: _.a + _.b,
f = lambda _: np.sqrt(_.e))
.group_by('c', 'd')
.agg(m_e = lambda _: np.mean(_.e),
v_f = lambda _: np.var(_.f),
cov_ef = lambda _: np.cov(_.e, _.f)[1,1])
.sort('m_e'))
This statement may feel similar to normal aggregation code from pandas
combined with some parts from tidyverse
in R. In steps it does the following;
kadro.Frame
object. It is merely a wrapper with some methods attached.e
and f
. These columns are based off the columns a
and b
and we access these series via lambda functions. These lambda functions assume the original pandas dataframe to be passed.c
and d
.e
, the variance of column f
and by calculating the covariance between e
and f
. Again, we can use any function that is able to aggregate a series object to a singleton value.m_e
, which denotes the calculated mean of e
that was calculated in the step before.The statements are readable and may remind you of the original pandas library. Note a few key differences.
mutate
method creates two new columns, one of which is based on a new column created in the same mutate
callagg
method can apply different functions on different groups and allows you to immediately name the created columns in one callThe goal is to have a minimal wrapper that allows most of all dataframe operations to be more expressive by being chainable.
Data should be a noun and any manipulations on it should be described with verbs. In R it is convenient to have those verbs be functions because the language allows you to write global operators that can chain functions togehter. In python it makes sense to have them wrapped with methods via an object instead.
The idea behind the tool is to have a more minimal api that accomodates 80-90% of the typical dataframe manipulations by attaching a few very useful composable verbs to a wrapped dataframe object. This work is not meant to replace pandas nor is it meant as a python port of popular r-packages (though I'll gladly admit that a lot of ideas are bluntly copied from tidyr
and dplyr
).
The goal of this work is to show a proof of concept to demonstrate extra composability and readability for python based data manipulation. Some performance sacrifices will ensue as a result but you will always be able to access the native pandas object.
When comparing with pandas there are a few notable differences;
mutate
are evaluated in orderCurrently, the following verbs are supported in Kadro;
You can find elaborate documentation here; https://koaning.github.io/kadro/.
Instead of mere documentation you fill find that all methods are properly documented with a docstring and that perhaps the best way to understand the library is to read the vignette; a notebook containing a demonstration of all the functionality from a-z. In the main of the repo you'll find a notebook containing a demonstration of all the functionality which can render from github.
Pip/Conda support is not fully up. You can use pip to install it via github tho.
pip install git+git://github.com/koaning/kadro.git
Otherwise, consider downloading and playing with this package by running;
python setup.py install
Contributions are welcome but the package is to remain minimal. If people want to add some extra tests; thats fine and always welcome. You can run tests via;
pytest
And you can update the docs by running the following from the /docs
folder;
pdoc --html --overwrite kadro.Frame
cp kadro/Frame.m.html index.html
rm -rf kadro
This package is not much more than an alternative ui in nature. Originally meant as a peronsal project and I don't expect many changes are ever needed. I may extra support if it gets traction but the package is intentionally minimal.
Feel free to notify me of issues.