weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 259 forks source link

Queries on weld groupby and sort #463

Open shashwatwork opened 5 years ago

shashwatwork commented 5 years ago

Hi,

In Groupby, I have used following code snippet

normal pandas

_df = pd.DataFrame({"a":[3,2,3], "b":[4,5,6]}) start = time.time() res = df.groupby('a').agg('sum') end = time.time() pandas_time_groupby = end-start print "({:.3} seconds)".format(pandas_timegroupby)

In weld

_start = time.time() input = gr.DataFrameWeld(df) groupby_sd = input.groupby("a").sum() end = time.time() weld_time_groupby = end-start print "({:.3} seconds)".format(weld_timegroupby)

Queries: 1.How to view or display the result of weld operation. like I need to print the output of groupby_sd *2.weld groupby is running better than normal pandas incase of small dataframe, If dataframe size is increased again pandas giving much better performance than weld (Please send me snippet if weld works fine on top of huge volume of data)

In Sort function I have searched lot on how to perform sort operation with weld like how we perform sort in normal pandas. Please let me know how to perform sort function with weld.