google / lovefield

Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.
https://google.github.io/lovefield/
Apache License 2.0
6.82k stars 367 forks source link

[Feature Request] Faster `String` Match #270

Closed falahati closed 3 years ago

falahati commented 3 years ago

Yes, I know match is the official method to search on string fields, however, it is quite slow.

A simple search with new RegExp('test') is 95% as fast as String.indexOf() probably due to some sort of optimization on chrome's part; however, adding anything else that forces the JS engine to actually parse and activate the RegEx engine like adding the insensitive flag makes it quite slow. In a dataset of 12k items, a simple new RegExp('test', 'i') takes around 300ms to 400ms to complete on one column. Adding more columns and it just increases from there linearly. However, if I load the whole table into memory and manually check each row with JS String.toLowerCase().indexOf() then it takes around 40ms to 50ms which is almost 10times faster. I would suspect that this can be faster and less memory-intensive if done inside the Lovefield's engine.

String.indexOf('test'): 25k ~ 0.07s // Case Sensetive String.includes('test'): 25k ~ 0.08s // Case Sensetive String.toLowerCase().indexOf('test'): 25k ~ 0.09s // Case Insensetive String.toLowerCase().includes('test'): 25k ~ 0.09s // Case Insensetive new RegExp('test'): 25k ~ 0.10s // Case Sensetive new RegExp('test', 'i'): 25k ~ 0.83s // Case Insensetive new RegExp(/^.*test.*$/): 25k ~ 0.80s // Case Sensetive new RegExp(/^.*test.*$/i): 25k ~ 0.85s // Case Insensetive

Therefore, I present you this feature request for a new function that uses String.indexOf() and String.toLowerCase().indexOf() for searching in string columns. A lower cased index could also be used to remove the String.toLowerCase() function for each row and increase the performance of the query.

freshp86 commented 3 years ago

@falahati: Please have a look at https://github.com/google/lovefield/issues/266#issuecomment-678883485 explaining that we no longer plan to add new features to Lovefield (see link for more details).

Regarding this feature request, this is something we had thought of, and at some point the goal was to allow the user to pass any function they desire for a predicate, instead of us having to add explicit support for indexOf() or toLowerCase.indexOf() etc (taking advantage of the fact that we are using JS to build the query).

Imagine something as follows:

db.select().
    from(myTable).
    where(myTable.foo.filter(data => data.toLowerCase().includes('hello world'))).
    exec();

Then you could pass any filter function, like the ones you mention above. Unfortunately for now there are no plans to add new features like this one.

falahati commented 3 years ago

So this "feature freeze" means that you won't accept pull requests also or it just means that the main team won't add new features?

I am interested to see if it worth keeping this issue open and work on a PR.

arthurhsu commented 3 years ago

We will accept pull requests that fixes bugs. As to API changes, it's very unlikely since we no longer have resources to perform security review.

falahati commented 3 years ago

I am going to close this issue then. Thanks for the fast response.