processwire / processwire-issues

ProcessWire issue reports.
45 stars 2 forks source link

PageTable fields with `sortfields` causes N+1 SQL queries #1971

Open tuomassalo opened 1 month ago

tuomassalo commented 1 month ago

Short description of the issue

Bad performance: if a PageTable field has sortfields, the sorting makes an SQL query (or more) for each subpage.

Expected behavior

Fetching a PageTable field does not cause a big number of unnecessary SQL queries.

Actual behavior

For each subpage, the sort algorithm queries each sortfield from on the fly. If a PageTable has 50 subpages and three sort fields, this means 150 extra SQL queries.

Optional: Suggestion for a possible fix

PR coming up.

Steps to reproduce the issue

  1. Add a template, e.g. mytpl
  2. To mytpl, add text field mytext
  3. Install the PageTable modules
  4. Create a PageTable field ptfield
  5. On Details tab, set Sort fields to mytext
  6. Add a PageTable field ptfield to basic-page
  7. Edit a basic page: add a few subpages to ptfield. Enter something to mytext for each one.
  8. Run this code:
$page = $wire->pages->get("id=1015"); // replace with the page id
$foo = $page->ptfield;

$queries = $wire->database->getQueryLog();
echo '<p>Number queries: ' . count($queries) . '</p>';
foreach($queries as $query){
    echo '<p>' . htmlspecialchars($query) . '</p>';
}
  1. Observe that for each subpage, the output has a line like:
SELECT field_mytext.data AS `mytext__data` FROM `field_mytext` WHERE field_mytext.pages_id=1021

Setup/Environment

ryancramerdesign commented 1 month ago

I'll sometimes aim for more simple 1-index select queries rather than a single query with multiple joins because the multiple simple queries can often consume less time than the single more complex query. Though it depends on the case, and I don't remember for this case specifically. Though the mentioned query is just the sort of type that might perform more quickly as independent queries than in a larger query, though can't say for certain without benchmarking. When it comes to sorting, performance wise it's preferable to sort on a native pages table property when possible, such as name, sort, created, modified, id, etc.

tuomassalo commented 1 month ago

About performance and prefething:

I've done some benchmarking by fetching pages with 10 or even 50 joined fields. Even with low network latencies (docker on local dev machine), using joins seems to perform about three times faster.

$pageIds = [/* list of 100 pages */];
$fields = [/* list of 50 basic text fields */];

foreach($pageIds as $pageId){
  $page = $wire->pages->get("id=$pageId", ["loadOptions" => ["joinFields" => $fields]]);
  // vs.
  // $page = $wire->pages->get("id=$pageId");

  foreach($fields as $field){
    $foo = $page->get($field);
  }
}

I might be missing something in my perf tests, but, with any number of fields, I would be surprised to find a common scenario where the database was slower to perform one SELECT with joins than N+1 simpler SELECT clauses. After all, EXPLAIN looks very good for the join selects that PW generates.

tuomassalo commented 1 month ago

One concern about my PR, though: am I correct that setting joinFields in getById() ignores any autojoin fields that the template might have, thus actually (potentially) increasing the total number of queries, if those fields are queried later?

Thus, I believe the patch should ideally merge sortfields and autojoin fields, if the template has that set.