mscdex / streamsearch

Streaming Boyer-Moore-Horspool searching for node.js
MIT License
69 stars 17 forks source link

Use Buffer#indexOf if available #2

Closed ronkorving closed 2 years ago

ronkorving commented 9 years ago

I wanted to point your attention to Buffer.indexOf which has the potential to speed up this library, possibly dramatically. If the API exists (depending on Node version I guess), it may make a lot of sense to use it.

PS: I'm using Dicer for multipart stream parsing and streamsearch shows up in my CPU profile more than may be necessary.

PS2: Amusing to see you actually reviewed that PR by Trevor :)

mscdex commented 9 years ago

The issue is that buffer.indexOf isn't really optimal yet. I think Trevor was going to look into various string matching algorithms, but that effort may be stalled. This module uses BMH for string matching, so I think it should currently be faster than buffer.indexOf.

ronkorving commented 9 years ago

That should be the goal though. @trevnorris are there any updates on this?

trevnorris commented 9 years ago

indexOf() is freaking slow. Haven't taken the time to fix that yet, but I'll try to do that soon.

ronkorving commented 9 years ago

Thanks a lot, @trevnorris. It's really appreciated.

ronkorving commented 9 years ago

This just landed: https://github.com/nodejs/node/commit/a18dd7b7882ca955075ad977a5e7838c792bb76f It would be interesting to benchmark streamsearch when using it.

mscdex commented 9 years ago

This module can't really use buffer.indexOf() since it has to store algorithm-specific state (it's searching a stream instead of a single block of data). buffer.indexOf() does not expose its internal state anyway and it does not accept internal state to use for future invocations, it's meant as a one-shot search kind of thing.

Having .indexOf() for streams somehow in core would be neat, but I doubt something like that would ever be accepted.

mscdex commented 2 years ago

Closing this for now as nothing has changed and there is nothing actionable here. Even if there was some way to make use of it with streams (or even if node had a stream-compatible indexOf()), I'm still not entirely convinced that reaching into C++ to perform the search wouldn't cause a noticeable performance hit.

ronkorving commented 2 years ago

@mscdex Long time no see ;) By all means close this, I don't think this issue impacts my life anymore. If it did, I guess I would have serious problems.