Closed adetunjiakintundeakinde closed 2 years ago
Hey there. Can you share a code snippet to illustrate what you're after?
Specifically the interested question here seems to be ordering: If binary search would be applicable, you need to have your data ordered; it would be good to know more about to help out here, and if there are potential alternative approaches.
Thanks @srenatus So our usecase like I said is to iterate over a large dataset(10 MB worth of json data) to do a partial search....
search_user(q, users) = users {
not is_number(q)
users := [user |
some user_name
contains(lower(user_name), lower(q))
]
}
We figured out that this is not efficient. My thinking are there means to do quick partial searches..A binary search in itself may not be too helpful.
Hope this helps explain my usecase
Have you checked the performance documentation, specifically "Use objects over arrays"?
Maybe it would be possible to arrange your data such that it's keyed by username, so you can access data.users[lower(user_name)]
...? 🤔
Yes..I agree with your idea. However we do a partial search and thats why we use the contains. So from example if someone uses the query green it would return greenland, greendale, greenstuck, greenbutcher etc
Indexing on arbitrary contains
isn’t possible, as we’d still need to scan through all items to do so. Indexing on startswith
would at least theoretically work, as you could build an index from the characters in each word and expand the search char by char.
One option would be to have something like “full text” indexing like Lucene/ES… but I think both options would be out of scope for what OPA does. I would suggest extracting that function out to an external component, and provide the result to OPA for making decisions on.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
Not sure if there's anything more to do here. OK to close this one, @adetunjiakintundeakinde ?
Closing for now due to inactivity. Let me know if you want to revisit this.
Good morning. So I am quite new OPA and I am currently facing an interesting use case.
We use OPA to perform authorization and currently our user dataset is quite large and we unfortunately have to loop through the whole dataset when performing a search(we can't do an exact search)
I am wondering if it were possible to introduce quicker search algorithms like binary search etc