pescadores / pescador

Stochastic multi-stream sampling for iterative learning
https://pescador.readthedocs.io
ISC License
75 stars 12 forks source link

Profile and speed up #49

Open cjacoby opened 7 years ago

cjacoby commented 7 years ago

I did a bunch of experimenting about a month ago, and noticed that the Mux was significantly slowing down my sampling/training process.

Todo:

bmcfee commented 7 years ago

Is your profiling code available somewhere?

cjacoby commented 7 years ago

It was in a work thing; I'll try to refactor it into something sharable in the next couple days. (I'm on the road to Seattle all day today, won't have internet for real till tomorrow)

On Thu, Jan 5, 2017, 10:26 Brian McFee notifications@github.com wrote:

Is your profiling code available somewhere?

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-270718163, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t8yHuvCmv81Wh2W84v-Fv3-MzvxlYks5rPTXLgaJpZM4LXA-s .

bmcfee commented 7 years ago

Ping! We should sync up on this some time. It would be great to put out 1.0 in the near future.

cjacoby commented 7 years ago

I agree. I've been thinking about this.

Want to hang about it soon? Free almost all of tomorrow and PT morning Saturday.

On Thu, Mar 2, 2017, 09:52 Brian McFee notifications@github.com wrote:

Ping! We should sync up on this some time. It would be great to put out 1.0 in the near future.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-283727650, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t8y3otHT6H-XGp0UCKh_E6IezSV2Rks5rhwHJgaJpZM4LXA-s .

cjacoby commented 7 years ago

Thinking about making another pescadores repo to do this. pescador-benchmarks or something. Thoughts?

bmcfee commented 7 years ago

I did a bunch of experimenting about a month ago, and noticed that the Mux was significantly slowing down my sampling/training process.

Now that we have pytest in place, I threw pytest-profiling at it and plotted the results in snakeviz. It looks like the biggest chunk is actually in random index sampling np.random.choice:

image

note that the per-time cost is miniscule, but we call it 28604 times, so it adds up.

Otherwise, it looks like time is dominated by the underlying streamers/generators, which is what we'd expect.

bmcfee commented 7 years ago

@ejhumphrey and I just had a chat, and it looks like the rng bottleneck could be easily accelerated by adding a buffering mechanism on top of rng. I'll bench it and report back.

bmcfee commented 7 years ago

The savings here are pretty minimal -- I think it may not be worth implementing.

EDIT: for context, the issue is that muxen often need to change the sampling distribution, eg, whenever a streamer dies or is reactivated. This triggers a cache invalidation for pre-fetched random samples, so we have to reset the pre-fetcher, which leads to wasted sampling. We do get a modest saving (order of tens of milliseconds) in the benchmark, but the added complexity of having an indirection layer may not be worth it.

Similarly, getting clever about when to invalidate the cache is probably a bad idea from a maintainability perspective.

ejhumphrey commented 7 years ago

😞

bmcfee commented 7 years ago

Random thought: @cjacoby are you sure it was mux and not buffer that was slowing down? AFAICT, the mux overhead is relatively minimal, notwithstanding calls out to rng, and depending on your seed pool size, a bunch of vector math. Buffering will typically invoke copies though, and that could for sure be a performance hit.

cjacoby commented 7 years ago

I think you're probably correct; what you describe jives with my experience.

I use mux and buffer in tandem a lot, so at the end of the day they they get lumped into one bucket in my head.

On Fri, Mar 10, 2017 at 1:48 PM Brian McFee notifications@github.com wrote:

Random thought: @cjacoby https://github.com/cjacoby are you sure it was mux and not buffer that was slowing down? AFAICT, the mux overhead is relatively minimal, notwithstanding calls out to rng, and depending on your seed pool size, a bunch of vector math. Buffering will typically invoke copies though, and that could for sure be a performance hit.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-285793911, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t89bHaC-KNES-_gEcSL6T02N0spjuks5rkcVFgaJpZM4LXA-s .

cjacoby commented 7 years ago

Hey @bmcfee - I have some new data / benchmarks on this for you, after working on an issue @rabitt was having for a while. (this is similar to what I had done before, but more complete.)

Check out the notebook. (Most useful is the plots at the bottom).

I haven't played with Mux in it yet, 'cause it's more complicated to set up.

General conclusions:

I acknowledge that there is buffer_size streams from the contained streamer happening for every buffered stream, it makes sense that BufferedStreamer takes longer in general. But what does NOT make sense to me is why the time for sampling from BufferedStreamer increases with number of samples in the file, but the normal streamer's time doesn't increase. Is this a memory leak? Did I do something funny here?

bmcfee commented 7 years ago

Thanks for looking into this!

It would be worth plotting as a function of the size of the buffer. It looks like the current benchmark keeps the buffer size fixed at 128?

I don't know what's going on with the dependency on the size of the file though.. maybe that's latency coming from numpy after hitting a page size limit? Can you re-run this benchmark with cProfile to see where the hits are coming from?

cjacoby commented 7 years ago

Alright, yes; I'll do both of those things and get back to you.

ejhumphrey commented 7 years ago

first, do we want to have this settled for 1.1, in case it causes any important changes? or are we reasonably confident that this can happen behind the scenes without major API changes?

related: @cjacoby any progress to report, or a branch to pass off?

cjacoby commented 7 years ago

This had slipped off my radar; I'll focus on this for the next few days.

How would you like this info presented? I had a notebook before; I think a clean and well-documented notebook should do the trick. But i could also make a script that prints the info or something.

ejhumphrey commented 7 years ago

I'd vote script here, all the way. Notebooks are great for documentation, but I think (given both how slippery this has been, and the role that speed / efficiency play in the value of this library) we would do well to make this more regression test than illustration.

I'm happy to help here, and wonder if pytest-benchmark might have a place here.

cjacoby commented 7 years ago

Mm yeah, that's probably the best course of action.

On Tue, Jun 13, 2017 at 10:56 PM Eric J. Humphrey notifications@github.com wrote:

I'd vote script here, all the way. Notebooks are great for documentation, but I think (given both how slippery this has been, and the role that speed / efficiency play in the value of this library) we would do well to make this more regression test than illustration.

I'm happy to help here, and wonder if pytest-benchmark http://pytest-benchmark.readthedocs.io/en/stable/ might have a place here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pescadores/pescador/issues/49#issuecomment-308260647, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4t86FTsupGO1IabGN-OPAudT8U2SwWks5sDwWngaJpZM4LXA-s .

ejhumphrey commented 7 years ago

here here