silverstripe / silverstripe-fulltextsearch

Adds external full text search engine support to Silverstripe
BSD 3-Clause "New" or "Revised" License
44 stars 83 forks source link

WIP: NEW Respect canView permissions during index operations #276

Closed Cheddam closed 4 years ago

Cheddam commented 4 years ago

The module now calls canView() on documents to determine whether they should exist in the index at all, no longer solely relying on this check occurring post-search. This decreases the likelihood of data leakage, and should result in more accurate search result pagination.

This new logic attempts to make use of InheritedPermissions::canViewMultiple() when the DataObject exposes it via getPermissionChecker(). This results in less accurate output, as it's based solely on permissions stored in the database, but should be sufficient for most models that are using it.

The new check can be disabled on a per-class basis (will also disable descendants), for situations where it's too expensive or known not to be necessary.

This PR also bumps the minimum PHP requirement for this module to 7.1, to enable use of modern language features.

Performance testing

Initial benchmarks (native PHP 7.1 / MySQL 8 on a mid-range 15" 2018 MacBook Pro, running against data from the default frameworktest population script, which generates ~2000 Page objects):

canView Disabled
12.10s user 5.63s system 79% cpu 22.353 total
12.15s user 5.64s system 77% cpu 22.970 total
12.21s user 5.66s system 73% cpu 24.229 total

canView Enabled (with canViewMultiple cache)
12.19s user 5.48s system 80% cpu 22.063 total
12.34s user 5.66s system 80% cpu 22.382 total
12.51s user 5.62s system 71% cpu 25.334 total

canView Enabled (without canViewMultiple cache)
21.97s user 6.22s system 79% cpu 35.486 total
22.06s user 6.49s system 79% cpu 35.965 total
22.74s user 7.01s system 77% cpu 38.396 total

TODO

chillu commented 4 years ago

Sweet, benchmarks look good. How many objecdts did you index there? Did any of them have restricted permissions, and/or inheritance? Good to see that canViewMultiple caching works, and indexing mostly "permission-less pages" is our most common use case. But that implementation would get slower e.g. if a full page tree is set to admin only, incl. children. So the data volume and structure is important to draw conclusions from the benchmark here.

Cheddam commented 4 years ago

Superceded by #281.