twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
1k stars 184 forks source link

replace assign with []= syntax to reduce copying #76

Open leifwalsh opened 4 years ago

leifwalsh commented 4 years ago

pandas's df.assign(...) copies the entire dataframe, while using df[col] = ... syntax avoids copying everything. This reduces memory overhead significantly in calls to TimeSeriesDataFrame.toPandas().