kenahoo / Path-Class

Cross-platform path specification manipulation
http://search.cpan.org/dist/Path-Class/
15 stars 28 forks source link

Optimizations reduce large recursive find runtime by over 50% #15

Closed dagolden closed 11 years ago

dagolden commented 11 years ago

These commits mostly avoid Path::Class hitting File::Spec repeatedly.

In my test case of finding all .pm files in @INC (with Path::Class::Rule) I saw a run time reduction of over 50%.

kenahoo commented 11 years ago

I definitely like the ideas. I've cherry-picked & pushed your 9c8765a, as a separate commit. I think it's a good idea to cache the stringified version too.

But I'm not convinced we should always stringify upon creation, because some uses will create & destroy P::C objects without ever needing them as a string, and such pre-caching would be a waste in those cases. Also, if we ever start implementing the canonpath() and friends logic in P::C, we won't want it stringified, we'd probably want to work directly on the list.

kenahoo commented 11 years ago

I also just cherry-picked the canonpath() commit 3ffc39a.

dagolden commented 11 years ago

My thought on the aggressive pre-caching was that almost every method calls some perl function that forces stringification anyway. The only ones that don't are things like basename, components, and foreign representation ones.

I have a hard time imagining scenarios where people are making thousands of Path::Class objects unless they are interacting with a real filesystem. If people are doing a little bit of path manipulation, the overhead of stringification they don't need is pretty minor, whereas for people who really need it crunching through a big filesystem, the benefits of precaching are big.

We could actually go further and use the precached string in all the perl functions directly, instead of passing $self and relying on Perl to call the stringification overloading. That would be another huge optimization that just occurred to me.

kenahoo commented 11 years ago

Putting aside your final idea for a moment (about avoiding overloading), why is precaching actually a speedup? Doesn't it still perform stringification exactly once whether we precache or not? Precaching just seems like it calls it earlier in its life cycle.

dagolden commented 11 years ago

It avoids the "if" logic on each call to check if its cached. That's pretty minor compared to my point about avoiding overloading. I suspect that's what I was thinking and then stopped short for some reason.

kenahoo commented 11 years ago

I'm still open to these ideas, but I'm closing this pull request because it no longer applies cleanly, and because it's been a while without activity.