Implements the foundations of a API-compatible DataFrame backed by Weld.
DataFrames are dictionary-like containers of Series of the same length. Each Series can be a different data type. Operations on DataFrames align on column name. Unlike pandas, DataFrame operations do not align on row indexes. This will likely be the permanent behavior o Grizzly, as implementing row-based indexing requires forgoing many data-parallel optimizations.
This patch adds the foundation of the DataFrame class: evaluating Weld computations, converting Grizzly DataFrames to Pandas DataFrames, and basic binary operations. It also adds a new class, ColumnIndex, that facilitates aligning columns when performing operations between DataFrames.
Examples
>>> df = GrizzlyDataFrame({'name': ['mike', 'sam', 'sally'], 'age': [20, 22, 56]})
>>> df
name age
0 mike 20
1 sam 22
2 sally 56
>>> df2 = GrizzlyDataFrame({'nom': ['jacques', 'kelly', 'marie'], 'age': [50, 60, 70]})
>>> df.add(df2).to_pandas()
age name nom
0 70 NaN NaN
1 82 NaN NaN
2 126 NaN NaN
Other Changes in this patch
This patch makes some other restructuring changes. Namely
Forwarding calls automatically to Pandas is disabled for now. Users should explicitly use to_pandas().
GrizzlySeries no longer subclasses Series. The main purpose of this was method forwarding; without this feature, subclassing just adds unnecessary constraints.
Adds the ability to deserialize structs of NumPy vectors in the NumPyWeldEncoder.
Implements the foundations of a API-compatible DataFrame backed by Weld.
DataFrames are dictionary-like containers of Series of the same length. Each Series can be a different data type. Operations on DataFrames align on column name. Unlike pandas, DataFrame operations do not align on row indexes. This will likely be the permanent behavior o Grizzly, as implementing row-based indexing requires forgoing many data-parallel optimizations.
This patch adds the foundation of the
DataFrame
class: evaluating Weld computations, converting Grizzly DataFrames to Pandas DataFrames, and basic binary operations. It also adds a new class,ColumnIndex
, that facilitates aligning columns when performing operations between DataFrames.Examples
Other Changes in this patch
This patch makes some other restructuring changes. Namely
to_pandas()
.GrizzlySeries
no longer subclassesSeries
. The main purpose of this was method forwarding; without this feature, subclassing just adds unnecessary constraints.NumPyWeldEncoder
.