Closed ronkorving closed 2 years ago
The issue is that buffer.indexOf
isn't really optimal yet. I think Trevor was going to look into various string matching algorithms, but that effort may be stalled. This module uses BMH for string matching, so I think it should currently be faster than buffer.indexOf
.
That should be the goal though. @trevnorris are there any updates on this?
indexOf()
is freaking slow. Haven't taken the time to fix that yet, but I'll try to do that soon.
Thanks a lot, @trevnorris. It's really appreciated.
This just landed: https://github.com/nodejs/node/commit/a18dd7b7882ca955075ad977a5e7838c792bb76f It would be interesting to benchmark streamsearch when using it.
This module can't really use buffer.indexOf()
since it has to store algorithm-specific state (it's searching a stream instead of a single block of data). buffer.indexOf()
does not expose its internal state anyway and it does not accept internal state to use for future invocations, it's meant as a one-shot search kind of thing.
Having .indexOf()
for streams somehow in core would be neat, but I doubt something like that would ever be accepted.
Closing this for now as nothing has changed and there is nothing actionable here. Even if there was some way to make use of it with streams (or even if node had a stream-compatible indexOf()
), I'm still not entirely convinced that reaching into C++ to perform the search wouldn't cause a noticeable performance hit.
@mscdex Long time no see ;) By all means close this, I don't think this issue impacts my life anymore. If it did, I guess I would have serious problems.
I wanted to point your attention to Buffer.indexOf which has the potential to speed up this library, possibly dramatically. If the API exists (depending on Node version I guess), it may make a lot of sense to use it.
PS: I'm using Dicer for multipart stream parsing and streamsearch shows up in my CPU profile more than may be necessary.
PS2: Amusing to see you actually reviewed that PR by Trevor :)