Open patrick-austin opened 2 years ago
Following discussion with @kevinphippsstfc, have:
lucene.searchBlockSize
instead of piggy backing maxIdsInQuery
GateKeeper
HasEntityId
interface so that the getReadableIds
function can run on either EntityBaseBean
or ScoredEntityBaseBean
without needing to loop to get their idfilterReadAccess
so we add all newResults
if we have no restrictions, and only iterate and check against the id set if restrictions were present. Should also now throw if we're over maxEntities
.This shouldn't change the performance of existing usage of getReadable
too much, as we do not introduce any additional loops, only refactoring some code and adding the if (restrictions == null)
check to enable early returns of all beans. Further improvements or changes for clarity are welcome.
Overall, this looks great, thanks! I've just requested a few additions to the comments to hopefully help future developers.
I'm wondering how best to test this given that I don't think the part of the code using "restrictions" will get tested much (if at all?) by the unit and integration tests. We could do with testing it on an ICAT containing some of the usual InvestigationUser and InstrumentScientist rules/restrictions.
Have added/expanded the documentation of the new/modified functions, if anything still needs clarifying just let me know.
Have expanded TestRS
to ensure it covers these changes. These changes will impact:
Note that the last two are only affected if an INCLUDE
query for a collection is used. For example, Facility INCLUDE InvestigationType
will run the changed code as there are multiple types at a facility but Investigation INCLUDE InvestigationType
will not, as there is only one per investigation.
Some implicit testing was taking place, as if the authorization were overly restrictive (i.e. if getReadableIds
is hacked to return no ids) then the following tests would already fail:
along with some tests in TestWS (as the functionality in question is in Gatekeeper, I only changed TestRS as part of this PR).
I have expanded these tests to also assert that the tests are not too permissive, by performing searches as the piOne
user who cannot see anything unless new rules are set.
The existing exportMetaDataQuery
is marked with @Ignore
, so I created a new test which performs both "permissive" and "restrictive" checks on the inclusion of related entities in the export dump.
Also refactored code to set up root
and piOne
sessions, and some automatic formatting of the TestRS file. Note that this branch does not have fixes for the unlrelated test failures: #275 #268
Changes
Added
getReadableIds
function. The logic here is the same as ingetReadable
, however we accept and return lists of IDs rather than entities. When dealing with the free text search results, we only have IDs. Before we were finding the bean for each ID in turn in order to run authorisation on it, then proceeding to return the ID. But as we already have theklass
of the entitity and the ID from Lucene we can simply operate on these instead.Instead of hardcoding the number of results to get from Lucene at 1000, instead use the
maxIdsInQuery
setting. This is the limit we would send in one batch to the DB ingetReadable
/getReadableIds
anyway, and the default value of 500 and max for Oracle of 1000 is the same order of magnitude as what we were currently using.Impact
I searched for
*
(returning all entities before authorising) as bothroot
and the non-rooticatuser
, representing best and worst case scenarios (all data visible, no data visible and needing to check DB for rules) for the old and new setup withmaxIdsInQuery
set to 500 and 1000. All times are in ms, and is measured between the firstEntityBeanManager - Got X results from Lucene
andEntityBeanManager - Returning Y results
logs made by icat.server.root
Old method is comparable for DS and DF since we only loop until we reach 300 accepted results, then return. New only submits one batch of entities to check, and returns in consistently ~12ms.
non-root
Old method takes around 1 to 2 ms per entity checked as we need to check every ID individually before being sure none are authorised. New method takes roughly the same amount of time for Investigations and DS as these both require a single batch of IDs (less than the limit in settings. Datafiles takes around half the time when the limit is doubled to 1000, as we only need to send half as many batches (averaging 24ms per batch).
id:...
in both old and new setup as root:Times are comparable with old and new approaches. For the new method, the time taken to authorise 1 result and hundreds (as in the
*
example) was also comparable.It's worth noting that in these tests I didn't actually have any rules in the DB to evaluate.