jreback commented 11 years ago

f6ac8c091dfb2bbbbb02a34c13022c5427b290eb to 05a737dbb9f857e25a211739d90c42f3d9428b8e

Bad news:

Results:
                                            t_head  t_baseline      ratio
name                                                                     
reindex_frame_level_reindex                 1.1343      0.7571     1.4982
reindex_frame_level_align                   1.1690      0.7548     1.5488

Good news:

Results:
                                            t_head  t_baseline      ratio
name                                                                     
frame_get_dtype_counts                      0.0968    208.6380     0.0005
frame_wide_repr                             0.5321    207.0730     0.0026
groupby_first_float32                       3.0386    336.2439     0.0090
groupby_last_float32                        3.2206    335.2571     0.0096
frame_to_csv2                             181.1550   2188.9260     0.0828
indexing_dataframe_boolean                 12.9179    125.7350     0.1027
write_csv_standard                         36.3702    230.9210     0.1575
frame_reindex_axis0                         0.3122      1.0984     0.2842
frame_to_csv_mixed                        340.1070   1091.9340     0.3115
frame_to_csv                              106.5671    219.1830     0.4862
frame_add                                  22.0693     34.1867     0.6456
frame_mult                                 22.0064     33.9859     0.6475
frame_reindex_upcast                       11.4179     17.3291     0.6589
frame_fancy_lookup_all                     14.3931     20.5621     0.7000
series_string_vector_slice                148.3371    195.5922     0.7584

???

frame_reindex_axis1                         2.3612      2.3386     1.0097

Fixed:

indexing_dataframe_boolean_rows             0.2180      0.1953     1.1164

jreback commented 11 years ago

I think that the multi-level reindexing ones are possibly related to the take changes (#2819) (which help a great deal in the most common cases), I didn't investigate much, that's just the top on profiling (in both old and new)

ghost commented 11 years ago

bisected?

ghost commented 11 years ago

vbench was meant exactly for that, but it hasn't been updated in a while. Too bad.

http://pandas.pydata.org/pandas-docs/vbench

jreback commented 11 years ago

can we run it on each commit and get the graph?

ghost commented 11 years ago

That's it's reason for being. I just added test_perf a few months ago for the devs.

jreback commented 11 years ago

agreed....how does it update though?

ghost commented 11 years ago

I think wes goes into the basement and presses a button.

ghost commented 11 years ago

we're getting too chatty on the issue tracker, need an irc channel. or campfire.

jseabold commented 11 years ago

FWIW, I lurk and learn, but I'm sure others hate the noise just as much as I find it helpful. A dev list might be preferable to IRC unless you can commit resources to a bot to archive everything in a publicly searchable way. I don't know how public campfire is.

ghost commented 11 years ago

We're definitely pushing the boundaries of S/N, that's not cool. Email is inconvenient for realtime, so not really a solution for many things. I believe freenode has web based logs for all channels, though I'm not sure I consider that a plus.

jreback commented 11 years ago

fyi...i think i found this regression (my issue!)...issue a fix in a bit

wesm commented 11 years ago

I could set up a hipchat for pandas devs if you like, but that wouldn't be public.

jreback commented 11 years ago

3093 fixes indexing_boolean_data_frame_rows

ghost commented 11 years ago

@wesm , punting on the chat quesion.

would it be possible to get a daily vbench going? we're discovering perf regression by mistake...

stephenwlin commented 11 years ago

hmm, well, i have kept all my old vb_suite logs (including several for testing #2819, #2867 against their immediate base commits) and I can't find any entries from where reindex_frame_level_* were affected negatively (in fact, a lot of results had the test improved by 20+%)...it's always possible there is some complex interaction between multiple lines on work that were developed independently, though...also I'm surprised frame_reindex_axis1 isn't on the positive side since I have consistent >25% improvements in that one.

jreback commented 11 years ago

pretty easy to break perf; thxs for checking

stephenwlin commented 11 years ago

one thing to note is that i'm on 32-bit, that might make a difference... also, here's my last vb_suite logs for the two commits #2819: http://pastebin.com/xEHn0Mhz, #2867: http://pastebin.com/sFccHx1K

jreback commented 11 years ago

@y-p is there a way to 'save' a vb_suite db for a particular commit, so that subsequent runs of test_perf can just refer to that rather than rebuilding it? (and if not, maybe make an option in test_perf?)

or maybe just save the build (rather than the db), but that would still save the get/build time

ghost commented 11 years ago

vb_suite has a persistent db, test_perf purges the db between runs. I'll add an option when I get a chance.

jreback commented 11 years ago

@stephenwlin I did a git bisect....here's what broke reindex_frame_level_align and reindex prob just an odd special case

10f65e02536b4dda0ab7761cb1d10e274b208cfb is the first bad commit
commit 10f65e02536b4dda0ab7761cb1d10e274b208cfb
Author: Stephen Lin <stephenwlin@gmail.com>
Date:   Wed Feb 13 03:33:35 2013 -0500

    ENH: optimize Cython take functions with memmove where possible

:040000 040000 c9e8c83951445333494415e5169abf7a7bad951c 45efa72631f1530adf6858155f3aa2bca0204907 M      pandas
[cow-jreback-~/pandas] git bisect log
git bisect start
# good: [05a737dbb9f857e25a211739d90c42f3d9428b8e] TST: skip problematic xlrd test
git bisect good 05a737dbb9f857e25a211739d90c42f3d9428b8e
# bad: [605f87edda24b07f0f0a1b2a2462a4e121629494] DOC:  update RELEASE.rst
git bisect bad 605f87edda24b07f0f0a1b2a2462a4e121629494
# good: [9faec8293aa7fda64b85f3aa05bff5600c7b28d1] BUG: don't show [] for empty series. close #2775
git bisect good 9faec8293aa7fda64b85f3aa05bff5600c7b28d1
# good: [9faa8defe954f89a69dfc9b48d7ec1cfcb9936d8] Merge branch 'dtypes_bug' of https://github.com/jreback/pandas into jreback-dtypes_bug
git bisect good 9faa8defe954f89a69dfc9b48d7ec1cfcb9936d8
# bad: [49e123f95d95b70d10f5e5852d2682741799aed1] RLS: fixup RELEASE.rst links
git bisect bad 49e123f95d95b70d10f5e5852d2682741799aed1
# bad: [cb5e7dd2f0c96108e84e930b86bc522f8c613cfb] Merge pull request #2871 from jreback/dtypes1
git bisect bad cb5e7dd2f0c96108e84e930b86bc522f8c613cfb
# bad: [f9dfa811aa1a925e43289491558d95e1e2a2e000] RLS: add release notes for 'opt-take-2' and 'fix-maybe-convert-objects'
git bisect bad f9dfa811aa1a925e43289491558d95e1e2a2e000
# good: [759b0db9436999877791e25bf524c9c40202bc6d] ENH: Consolidate and further improve performance of take functions
git bisect good 759b0db9436999877791e25bf524c9c40202bc6d
# bad: [10f65e02536b4dda0ab7761cb1d10e274b208cfb] ENH: optimize Cython take functions with memmove where possible
git bisect bad 10f65e02536b4dda0ab7761cb1d10e274b208cfb

jreback commented 11 years ago

@y-p thxs

ghost commented 11 years ago

I remember that thread. "is memmove slower then memcpy".

stephenwlin commented 11 years ago

hmm..well, that's weird, i also have a vb_suite diff that's memmove vs memcpy and everything there is +-10% (with most with +-5%, basically normally distributed around 1.0) ...maybe there's a 32-bit vs 64-bit issue...i will take a look though...i'll run a vb_suite with just that commit on my machine

stephenwlin commented 11 years ago

@jreback can you give me info about the machine you're using to test this (cython version, gcc version, etc.?) this commit literally does nothing but change an explicit loop into a single memmove call, so it's all up to what cython and the compiler do with that: any reasonable compiler should inline memmove to be just as fast or faster than a loop but I suppose there's a chance it might not...I suspect this test is creating some degenerate condition where memmove is being called on a short array over and over again and the compiler is not inlining correctly (i.e. it happens to exactly be the transpose of the ideal shape for the the optimization, so that the overhead of the call actually exceeds that of the original loop); if that's the case I can add an explicit check to avoid that for short arrays

jreback commented 11 years ago

As you posted elsewhere, the x64 case is probably using SIMD / compiler optimizations

Linux goat 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux
Cython is 0.17.2
[goat-jreback-~/pandas] gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8)

jreback commented 11 years ago

@stephenwlin did a great writeup for those who are interested

http://mail.python.org/pipermail/pandas-dev/2013-March/000008.html

going to close this issue for new, created #3114 to mark that we need a method for creating various type of c-contig/f-contig test cases in an easy way

ghost commented 11 years ago

@stephenwlin , that was a formidable piece.

stephenwlin commented 11 years ago

Found out it had to do with the fact that the width of the array to be copied is not compile-time constant: I posted a question on StackOverflow about this http://stackoverflow.com/questions/15554565/forcing-gcc-to-perform-loop-unswitching-of-memcpy-runtime-size-checks

jreback commented 11 years ago

@stephenwlin you are deep into the c-code! (that's a good thing!)

stephenwlin commented 11 years ago

@jreback, can you vbench #3130 ? I went the magic number route (for now), 128 bytes ended up slowing another test (didn't look into the particulars of it, though) so I'm going with 64 bytes for now. EDIT apparently I got some things mixed up here, #3130 as tested and submitted has a 256 byte limit and it doesn't have any downside, so I'm going with that

jreback commented 11 years ago

testing now....i'll post on that thread

stephenwlin commented 11 years ago

@jreback actually, i got my commits mixed up, turns out this was 256 bytes (it's the largest limit that doesn't have negative impacts on other tests, which is what we want)...I amend the comment and the issue but the commit was good (I verified the hash from my vb_suite log, just in case)

pandas-dev / pandas

PERF: regression from 0.10.1 #3089

3093 fixes indexing_boolean_data_frame_rows