radujica / baloo

The bare necessities of Pandas on the Weld runtime
BSD 3-Clause "New" or "Revised" License
14 stars 7 forks source link

Implement sorting without getting all data into Weld #6

Closed radujica closed 6 years ago

radujica commented 6 years ago

Previously all data was brought to Weld, sorted according to the given, i.e. 'by', columns, and returned. However, this meant all the data was cached. Now, only the given columns are passed to Weld which are then sorted to return a vector of indices showing the correct order. Then, all columns are filtered/iloc based on this vector of indices. This means only this new indices vector is cached.