Open cjacoby opened 7 years ago
Is your profiling code available somewhere?
It was in a work thing; I'll try to refactor it into something sharable in the next couple days. (I'm on the road to Seattle all day today, won't have internet for real till tomorrow)
On Thu, Jan 5, 2017, 10:26 Brian McFee notifications@github.com wrote:
Is your profiling code available somewhere?
— You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-270718163, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t8yHuvCmv81Wh2W84v-Fv3-MzvxlYks5rPTXLgaJpZM4LXA-s .
Ping! We should sync up on this some time. It would be great to put out 1.0 in the near future.
I agree. I've been thinking about this.
Want to hang about it soon? Free almost all of tomorrow and PT morning Saturday.
On Thu, Mar 2, 2017, 09:52 Brian McFee notifications@github.com wrote:
Ping! We should sync up on this some time. It would be great to put out 1.0 in the near future.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-283727650, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t8y3otHT6H-XGp0UCKh_E6IezSV2Rks5rhwHJgaJpZM4LXA-s .
Thinking about making another pescadores
repo to do this. pescador-benchmarks
or something. Thoughts?
I did a bunch of experimenting about a month ago, and noticed that the Mux was significantly slowing down my sampling/training process.
Now that we have pytest in place, I threw pytest-profiling at it and plotted the results in snakeviz. It looks like the biggest chunk is actually in random index sampling np.random.choice
:
note that the per-time cost is miniscule, but we call it 28604 times, so it adds up.
Otherwise, it looks like time is dominated by the underlying streamers/generators, which is what we'd expect.
@ejhumphrey and I just had a chat, and it looks like the rng bottleneck could be easily accelerated by adding a buffering mechanism on top of rng. I'll bench it and report back.
The savings here are pretty minimal -- I think it may not be worth implementing.
EDIT: for context, the issue is that muxen often need to change the sampling distribution, eg, whenever a streamer dies or is reactivated. This triggers a cache invalidation for pre-fetched random samples, so we have to reset the pre-fetcher, which leads to wasted sampling. We do get a modest saving (order of tens of milliseconds) in the benchmark, but the added complexity of having an indirection layer may not be worth it.
Similarly, getting clever about when to invalidate the cache is probably a bad idea from a maintainability perspective.
😞
Random thought: @cjacoby are you sure it was mux
and not buffer
that was slowing down? AFAICT, the mux overhead is relatively minimal, notwithstanding calls out to rng, and depending on your seed pool size, a bunch of vector math. Buffering will typically invoke copies though, and that could for sure be a performance hit.
I think you're probably correct; what you describe jives with my experience.
I use mux
and buffer
in tandem a lot, so at the end of the day they
they get lumped into one bucket in my head.
On Fri, Mar 10, 2017 at 1:48 PM Brian McFee notifications@github.com wrote:
Random thought: @cjacoby https://github.com/cjacoby are you sure it was mux and not buffer that was slowing down? AFAICT, the mux overhead is relatively minimal, notwithstanding calls out to rng, and depending on your seed pool size, a bunch of vector math. Buffering will typically invoke copies though, and that could for sure be a performance hit.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-285793911, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t89bHaC-KNES-_gEcSL6T02N0spjuks5rkcVFgaJpZM4LXA-s .
Hey @bmcfee - I have some new data / benchmarks on this for you, after working on an issue @rabitt was having for a while. (this is similar to what I had done before, but more complete.)
Check out the notebook. (Most useful is the plots at the bottom).
I haven't played with Mux
in it yet, 'cause it's more complicated to set up.
General conclusions:
Streamer
s do not take extra time to sample as the number of samples/examples in the source file increases (for constant example shape). Streamer
is sometimes faster than a generator!I acknowledge that there is buffer_size
streams from the contained streamer happening for every buffered stream, it makes sense that BufferedStreamer
takes longer in general. But what does NOT make sense to me is why the time for sampling from BufferedStreamer
increases with number of samples in the file, but the normal streamer's time doesn't increase. Is this a memory leak? Did I do something funny here?
Thanks for looking into this!
It would be worth plotting as a function of the size of the buffer. It looks like the current benchmark keeps the buffer size fixed at 128?
I don't know what's going on with the dependency on the size of the file though.. maybe that's latency coming from numpy after hitting a page size limit? Can you re-run this benchmark with cProfile to see where the hits are coming from?
Alright, yes; I'll do both of those things and get back to you.
first, do we want to have this settled for 1.1, in case it causes any important changes? or are we reasonably confident that this can happen behind the scenes without major API changes?
related: @cjacoby any progress to report, or a branch to pass off?
This had slipped off my radar; I'll focus on this for the next few days.
How would you like this info presented? I had a notebook before; I think a clean and well-documented notebook should do the trick. But i could also make a script that prints the info or something.
I'd vote script here, all the way. Notebooks are great for documentation, but I think (given both how slippery this has been, and the role that speed / efficiency play in the value of this library) we would do well to make this more regression test than illustration.
I'm happy to help here, and wonder if pytest-benchmark might have a place here.
Mm yeah, that's probably the best course of action.
On Tue, Jun 13, 2017 at 10:56 PM Eric J. Humphrey notifications@github.com wrote:
I'd vote script here, all the way. Notebooks are great for documentation, but I think (given both how slippery this has been, and the role that speed / efficiency play in the value of this library) we would do well to make this more regression test than illustration.
I'm happy to help here, and wonder if pytest-benchmark http://pytest-benchmark.readthedocs.io/en/stable/ might have a place here.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-308260647, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t86FTsupGO1IabGN-OPAudT8U2SwWks5sDwWngaJpZM4LXA-s .
here here
I did a bunch of experimenting about a month ago, and noticed that the Mux was significantly slowing down my sampling/training process.
Todo: