Open wesm opened 8 years ago
The obvious alternative is to allow pandas objects to backed by dynamic arrays. This is possible now that we require arrays to 1D and contiguous.
This has the advantage of still using eager evaluation, so you don't need to build machinery for differed evaluation. Also, you still get predictable performance, even if you inspect the array in between appends. I would guess looking at DataFrames being appended piece-by-piece is pretty common, even if only to check the size.
The downside is that this wouldn't really work with the current interface, because such appends need to in-place. Also, dynamic arrays reduce speed and increase memory requirements by small constant multiples.
Maybe it would make sense to deprecate DataFrame.append
and instead make an alternative DynamicDataFrame
(sub?)class that does an in-place append?
We could definitely have a mutating append and write into resizeable buffers (with growth factor 1.5 or 2). Something we can experiment with
I'm thinking we can come up with a plan to yield a better .append implementation that defers stitching together arrays until it's actually needed for computations.
We can do this by having a virtual
pandas::Table
interface that will consolidate fragmented columns only when they are requested. Will think some more about this