weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 260 forks source link

DataFrame foundations #510

Closed sppalkia closed 4 years ago

sppalkia commented 4 years ago

Implements the foundations of a API-compatible DataFrame backed by Weld.

DataFrames are dictionary-like containers of Series of the same length. Each Series can be a different data type. Operations on DataFrames align on column name. Unlike pandas, DataFrame operations do not align on row indexes. This will likely be the permanent behavior o Grizzly, as implementing row-based indexing requires forgoing many data-parallel optimizations.

This patch adds the foundation of the DataFrame class: evaluating Weld computations, converting Grizzly DataFrames to Pandas DataFrames, and basic binary operations. It also adds a new class, ColumnIndex, that facilitates aligning columns when performing operations between DataFrames.

Examples

>>> df = GrizzlyDataFrame({'name': ['mike', 'sam', 'sally'], 'age': [20, 22, 56]})
>>> df
    name  age
0   mike   20
1    sam   22
2  sally   56
>>> df2 = GrizzlyDataFrame({'nom': ['jacques', 'kelly', 'marie'], 'age': [50, 60, 70]})
>>> df.add(df2).to_pandas()
   age  name  nom
0   70   NaN  NaN
1   82   NaN  NaN
2  126   NaN  NaN

Other Changes in this patch

This patch makes some other restructuring changes. Namely