Closed samebchase closed 2 years ago
Thanks for trying rak!
I think you're comparing apples with oranges here, if I understand the rg
syntax correctly as looking for a literal string.
Could you do a time rak abcdefghijklmnop
? That should be a lot better.
Regexes are notoriously slow in Raku, unfortunately. It does not recognize that you're looking for a literal string. Also, it does searches on a grapheme level (which the literal string search in Raku also does, by the way).
Also: how many cores do you have available there? I note that the small needle used about 2x as much CPU as wallclock, but the large needle did not. I wonder why.
Also, if you just want to know the files in which the matches occur, you could also try --per-file
. This would remove the overhead of splitting into lines at the expense of needing the whole file in memory always. YMMV.
I just found a bug in the default file selection logic, which basically caused it to search all files, instead of just the ones with known extensions. This could also be a reason for the difference in performance that you saw.
Just uploaded version 0.1.10 with a fix.
@lizmat oh wow, let me try out the new one and report back.
Hi ,
Just did a quick comparison as well on a 250mb file text file containing only IPv4 addresses :
time rak 111 /ipv4/IPv4-3x-9x.txt => 60 secs
time rg 111 /ipv4/IPv4-3x-9x.txt => 4 secs
15x slower. On a single file. I'd say that is pretty good considering that rak
is doing grapheme based searches.
Could you try with --encoding=latin1
?
Also, I'd like to see the full time
output, including CPU used :-)
here you go:
time rak 111 /ipv4/IPv4-3x-9x.txt
________________________________________________________
Executed in 61,59 secs fish external
usr time 59,67 secs 308,00 micros 59,67 secs
sys time 1,32 secs 870,00 micros 1,32 secs
time rak 111 --encoding=latin1 /ipv4/IPv4-3x-9x.txt
________________________________________________________
Executed in 30,41 secs fish external
usr time 28,43 secs 0,31 millis 28,43 secs
sys time 1,00 secs 1,36 millis 0,99 secs
Nice, so in that case only 7x as slow :-)
I guess if people don't like the NFG semantics, they could add --encoding=latin1 as default.
rak 111 /ipv4/IPv4-3x-9x.txt 66,50s user 1,48s system 98% cpu 1:08,90 total
rak 111 --encoding=latin1 /ipv4/IPv4-3x-9x.txt 27,53s user 0,86s system 97% cpu 29,143 total
not sure I use the time you are expecting , this was one used from zsh on OSX MacBook Pro 2019 quad core
can this be closed now?
to me yes but I don't own this issue
Hey,
I've been trying our rak as a replacement for rg etc. and putting it through its paces.
On some folders I have lying around, it is working fast enough for small needles, but when the needle is large it takes too long.
Large needle.
Small needle.
And an unfair comparision with rg just for kicks.
Unfortunately, I cannot share the data it was being run on. Will try and see if I can replicate it with other data.
I could spend some more time checking this after digging through the code. At this point, I'm not sure if this is an issue with rak or the Raku regex engine. I'd imagine for a simple string the regex engine should go into the usual string search algos. Not sure why it is taking so long.