Closed shulard closed 8 years ago
@shulard Why GLOB_ONLYDIR
is required?
@shulard How performances are impacted? The fact we are applying glob
in all cases makes me unconfortable. Is it possible to detect a glob pattern when using in
and doing the appropriated computations here?
I used the GLOB_ONLYDIR
because glob
list all items by default (files and directories).
The path is used to build a Iterator\Recursive\Directory
which only handles directories. It's iterator's goal to retrieve all the items that must be listed.
@Hywan maybe we can add a pattern to avoid applying glob
on each path.
I though that glob
default behaviour is to handle that pattern. It always return an array with the matching paths (with dynamic patterns or not). So it's the same as applying a pattern before using it for me :smile:
Also when glob
will be able to handle more pattern cases, we can use it directly...
But I don't know about the performance... For me :
glob
search for patterns internally which is efficientHello,
With my last comment, do you think we really need to apply a glob
pattern on each path before calling glob
?
Can you compare the performances between GlobIterator
+ FilterIterator
vs. RegexIterator
vs. glob
? I reckon RegexIterator
may be faster than GlobIterator
and a “glob2pcre
” algorithm is not very complicated at all.
Hello @Hywan,
Of course, I can compare performance and I'll submit the bench here. Yes the "glob2pcre
" algorithm is not very complicated and it allow us to rely only on Iterators...
Hello @Hywan,
I've worked on this PR today and I've commited the benchmark code here: https://github.com/shulard/hoa-file-glob-benchmark
I've used the Hoa\Bench
lib for timing check and the result is that GlobIterator
+CallBackFilter
and glob
+GLOB_ONLYDIR
take the same amount of time.
I'm not particularly familiar with code benchmarking, if you have any ideas to improve the check I'm open to add it !
The test was run with a maxDepth
of 5 on different directories :
glob
take 2msGlobIterator
take 2msglob
take 46msGlobIterator
take 50msglob
take 1087 to 1183msGlobIterator
take 1081 to 1193msI can't find a way to implement the RegexIterator
method. If we want to use that, we need to crawl the folder and apply the regex generated so there are some problems :
This is a lot of useless computing because it require to develop the whole glob process...
What are you thinking about that update ?
Thanks Stéphane
@shulard Nice work! See my comment.
@Hywan,
I've updated and rebased the code... Now the in
method contains the glob application.
I've also benched the Finder
object with a new test : Try to crawl a path without glob
pattern inside
The goal of this test is to show glob
call overhead. The execution time is similar in all the 3 cases :
glob
pattern computationglob
useGlobIterator
useI've updated my repo to handle that case.
I am assigning @Jir4 to this issue.
I'll take a look tomorrow
The PR is good for me, the code is simplier to understand with a simple glob
use than with a GlobIterator
use and the bench are similar.
IRC logs can be interesting: https://botbot.me/freenode/hoaproject/2016-02-17/?msg=60227978&page=1.
So we have to test if we have to use glob or not. Is there a syntax to detect ?
@Jir4, if you want I can add the syntax detection behaviour in the current code (or you want to work on it ?)...
If you can do it i think it's simplier
Of course, I'll work on it quickly!
I've used a basic regex to detect all the components of glob syntax: /\*|\?|\[([^\]]+)\]|\{([^}]+)\}/
I've added the braces because in PHP glob it is possible to use them... Maybe we prefer stick on the wikipedia syntax definition ?
I though that the GlobIterator
in PHP is not able to handle the braces but it must be checked.
For the moment the implementation is only using the glob
function but detect if it's required or not...
Can you run some bench ? It can be usefull to compare with and without the preg_match
Yep of course, I'll submit the results here!
I've updated my bench repo here: https://github.com/shulard/hoa-file-glob-benchmark
Now I check with glob
, with Iterator\Glob
, with detection+glob
, with detection+Iterator\Glob
:
glob |||||||||| 1677ms, 22.0%
iterator |||||||||| 1723ms, 22.6%
globnodetect ||||||||||| 1800ms, 23.6%
iteratornodetect |||||||||| 1667ms, 21.9%
Execution time is really similar with the given path patterns :
/Users/shulard/Sites/Clients
/[u]sr/[lb]*/
All the objects found 42083 files on my computer.
I ran the test multiple times, Iterator\Glob
results are all consistant but glob
can take about 100ms more on some calls... Maybe it's my computer, maybe it's the implementation...
I also confirmed that braces are not handled by the GlobIterator
implementation in PHP (but it's not an official feature of glob...).
:+1: What are your observations then?
I think that we need to use the GlobIterator
which is more consistent. Then the brace support is not really important in our case (I think...).
Hello !
What's your POV here ? Do you think we need to use the GlobIterator too ? I can update the PR code with the final method during this week.
I think GlobIterator
is more consistent, but… why does it have less features? Today Hoa\Iterator\Glob
extends GlobIterator
from SPL (https://github.com/hoaproject/Iterator/blob/60bdefab8db17717871a11101dedec60572f95b8/Glob.php). You may want to ask on php-src
internals why glob
and GlobIterator
do not provide the same features. Based on these answers, we will see what to do next.
Thoughts?
And yes, sometimes it can be very annoying to develop inside Hoa because we have to triple-check everything and every details, but this is why the quality is high :-p. Also, it often happens that we find inconsistency in PHP API or between PHP VM (remember Hoa\Iterator\RegularExpression
—and Hoa\Iterator\Recursive\RegularExpression
— with https://github.com/facebook/hhvm/issues/3909 and https://bugs.php.net/bug.php?id=68128). I wonder if we should communicate about this…
@Hywan, no problem about the triple-check :smile:. This process allow me to learn a lot about quality process and reviews, it's awesome to be part of that !
I'll make an email to php-src internal about the difference betweens the 2 globs.
About the discovered inconsistencies, I think Hoa should communicate about it! It'll help move forward the different projects and also show that Hoa is really making deeper research about it's code.
Thread created on php-general mailing list: http://news.php.net/php.general/325485
I got an answer from PHP ML : http://news.php.net/php.general/325487
It seems that glob
is a porting of the linux's glob feature. The GlobIterator
is more strict regarding the glob pattern implementation and we saw that there is no overhead when using an Iterator combination.
Because the specific glob
features are enabled with flags, the GlobIterator
is just equivalent to a glob
call with no flags.
Hello !
I've updated the implementation to use the Iterator
.
Is the ML reply enough to validate that choice ?
@shulard Good work! Very proud of you. We can go for Hoa\Iterator\Glob
, it's fine.
Thanks a lot for the review, I've updated the code with your feedback.
The regex pattern is really simpler :smile:.
@Jir4 Did you find time for the review?
ping @Jir4
I feel really sorry to ask for changes while you are waiting for so long… I will try to be more serious 😄.
Hello, don't worry update the delay 😄 the only important thing is to move ahead ! I'll review your comments asap...
Ready for a final review before the merge?
Yep, I'm ready for that final review... I've checked CS and it seems that this repo hasn't been standardized for the moment... Code seems clean (just a warning about the @return void
in constructor comments...).
Thank you!
👍 thank you for your patience and your reviews 😄.
Are you kidding? You're the patience guy, not me… Anyway, this is good work :-).
Maybe my english is not good enough 😄 I just want to thank you for all the feedback you made on that PR, it was not easy... Delay is not a real issue here ! A nice improvement in this lib 👍
Fix #18.
With that update if you specify a path like "/emp" or "/{a,b,c}.php", it will be processed nicely.
Example:
Does not use the
Iterator\Glob
implementation here because it does not have the GLOB_ONLYDIR filter. It can be replaced with a combination ofIterator\CallbackFilter
andIterator\Glob
but it seems more complex which is not suitable here...