Closed bluefir closed 12 years ago
We need more performance benchmarks in the vbench suite. Thanks for the feedback. We'll investigate.
This looks like 4a5b75b44b0048, though I'm not sure why take is so expensive. the pending #2253 (3688e53) fixes the problem for me.
Testcase:
import pandas as pd
num=250000
l1=[randint(0,1000) for x in range(num)]
l2=[randint(0,20000) for x in range(num)]
l3=[randint(0,20000) for x in range(num)]
l4=[randint(0,20000) for x in range(num)]
a=pd.DataFrame(dict(zip([0,1,2,3],[l1,l2,l3,l4]))).set_index([0,1])
b=a.to_sparse()
%timeit b/100
%timeit b.to_dense()
%timeit b.save('test.pk1')
Edit: but perhaps there's another issue at play. I can't reproduce anything like 90s runtime on this data
Doh, this will teach me to review PRs more carefully; this is theoretically what vbench is for. I will fix
Ugh, iteritems
for all DataFrames has borked performance. Guess we're going to see 0.9.2 sooner rather than later
This is what I have in version 0.9.0:
Now this is what I get in 0.9.1:
So, in the new version SparseDataFrame methods that used to run in less than 7-130 ms now run in more than 90 s. Ouch! What happened?