open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.52k stars 1.32k forks source link

Adding algorithms that enable quicker search of sets #4131

Closed adetunjiakintundeakinde closed 2 years ago

adetunjiakintundeakinde commented 2 years ago

Good morning. So I am quite new OPA and I am currently facing an interesting use case.

We use OPA to perform authorization and currently our user dataset is quite large and we unfortunately have to loop through the whole dataset when performing a search(we can't do an exact search)

I am wondering if it were possible to introduce quicker search algorithms like binary search etc

srenatus commented 2 years ago

Hey there. Can you share a code snippet to illustrate what you're after?

Specifically the interested question here seems to be ordering: If binary search would be applicable, you need to have your data ordered; it would be good to know more about to help out here, and if there are potential alternative approaches.

adetunjiakintundeakinde commented 2 years ago

Thanks @srenatus So our usecase like I said is to iterate over a large dataset(10 MB worth of json data) to do a partial search....

  search_user(q, users) = users {
    not is_number(q)
    users := [user |
        some user_name
        contains(lower(user_name), lower(q))
    ]
}

We figured out that this is not efficient. My thinking are there means to do quick partial searches..A binary search in itself may not be too helpful.

Hope this helps explain my usecase

srenatus commented 2 years ago

Have you checked the performance documentation, specifically "Use objects over arrays"?

Maybe it would be possible to arrange your data such that it's keyed by username, so you can access data.users[lower(user_name)]...? 🤔

adetunjiakintundeakinde commented 2 years ago

Yes..I agree with your idea. However we do a partial search and thats why we use the contains. So from example if someone uses the query green it would return greenland, greendale, greenstuck, greenbutcher etc

anderseknert commented 2 years ago

Indexing on arbitrary contains isn’t possible, as we’d still need to scan through all items to do so. Indexing on startswith would at least theoretically work, as you could build an index from the characters in each word and expand the search char by char.

One option would be to have something like “full text” indexing like Lucene/ES… but I think both options would be out of scope for what OPA does. I would suggest extracting that function out to an external component, and provide the result to OPA for making decisions on.

stale[bot] commented 2 years ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

anderseknert commented 2 years ago

Not sure if there's anything more to do here. OK to close this one, @adetunjiakintundeakinde ?

anderseknert commented 2 years ago

Closing for now due to inactivity. Let me know if you want to revisit this.