Open sunnysideup opened 7 years ago
Have you tried adding ID
as the last item in the list?
This does look like a genuine bug, but perhaps adding ID would be a workaround for it
@flamerohr - i fixed my code as soon as I noticed the bug by adding ID as a sort key and that should also be the solution for this bug. Data Objects can be sorted in any way, but sort by ID ASC/DESC should always be added as the last sort criteria, no matter what.
Check this out:
I have now added to Page_Controller
:
public function PaginatedObjects()
{
$list = MyDataObject::get();
return new PaginatedList($list, $this->getRequest());
}
and I have added to /themes/simple/templates/Layout/Page.ss
:
<ul>
<% loop $PaginatedObjects %>
<li>$Title</li>
<% end_loop %>
</ul>
<% if $PaginatedObjects.MoreThanOnePage %>
<% loop $PaginatedObjects.Pages %>
<% if $CurrentBool %>
$PageNum
<% else %>
<% if $Link %>
<a href="$Link">$PageNum</a>
<% else %>
...
<% end_if %>
<% end_if %>
<% end_loop %>
<% end_if %>
PAGE 1:
PAGE 2:
PAGE 3:
AMAZING!
I agree with you that this is a critical bug and I am very surprised to find it.
It is so serious because anytime you do not sort by a field that is enforced to be unique on a database level, i.e. in basically all cases, e.g. sorting by Title, Created, LastEdited, SortNumber (SortableGridField), Sort (SiteTree), by Date, and so on.... ) you will potentially end up with missing / double-up records in a GridField / Paginated List.
unfortunately, this is a potential problem in any given relational database, many queries do not explicitly provide a sort (or not a sort with at least a unique column) and has the "implied" sorting by ID, which is what happened here.
I've marked as critical because of data disappearing, but I encourage always adding explicit ID
sort to the end of any sort calls.
I can't replicate this on 3.6.1
. Can you test it on a plain install?
I run this on a plain 3.6.1
I wonder if the auto increment on your ID is isn't working properly. Can you try deleting the entire DB and starting over? I'm curious if you're getting IDs that are non sequential with your Item numbers due to all the deletions. Try adding ID
as a summary_field
to help debug.
No problems with IDs
Title | ID |
---|---|
item no. 1 | 334 |
item no. 2 | 335 |
item no. 3 | 336 |
item no. 4 | 337 |
item no. 5 | 338 |
item no. 6 | 339 |
item no. 7 | 340 |
item no. 8 | 341 |
item no. 9 | 342 |
item no. 10 | 343 |
item no. 11 | 344 |
item no. 12 | 345 |
item no. 13 | 346 |
item no. 14 | 347 |
item no. 15 | 348 |
item no. 16 | 349 |
item no. 17 | 350 |
item no. 18 | 351 |
item no. 19 | 352 |
item no. 20 | 353 |
item no. 21 | 354 |
item no. 22 | 355 |
item no. 23 | 356 |
item no. 24 | 357 |
item no. 25 | 358 |
item no. 26 | 359 |
item no. 27 | 360 |
item no. 28 | 361 |
item no. 29 | 362 |
item no. 30 | 363 |
item no. 31 | 364 |
item no. 32 | 365 |
item no. 33 | 366 |
item no. 34 | 367 |
item no. 35 | 368 |
item no. 36 | 369 |
item no. 37 | 370 |
item no. 38 | 371 |
item no. 39 | 372 |
item no. 40 | 373 |
item no. 41 | 374 |
item no. 42 | 375 |
item no. 43 | 376 |
item no. 44 | 377 |
item no. 45 | 378 |
item no. 46 | 379 |
This error actually goes back to mysql. We are running
SELECT FROM MyDataObject
ORDER BY SameSame
LIMIT 0,10
SELECT FROM MyDataObject
ORDER BY SameSame
LIMIT 10,10
SELECT * FROM MyDataObject
ORDER BY SameSame
LIMIT 20,10
And Mysql returns the records in a random order - although the first page of pagination seems to refute this theory.
but I encourage always adding explicit ID sort to the end of any sort calls.
@flamerohr - I have not come across many (if any) modules that explicitly sort by ID
. For example SiteTree
does not sort by ID and there will definitely be situations where the Sort values are not unique in SiteTree from what I know as this is not enforced on a Database level.
What kind of output do you get when you just run SELECT Title FROM MyDataObject ORDER BY SameSame ASC
in the database?
I run the following code:
public function MyDataObjectRaw()
{
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 0, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 10, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 20, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
}
and the output that I get is as follows:
item No.36 = 369 item No.25 = 358 item No.26 = 359 item No.27 = 360 item No.28 = 361 item No.29 = 362 item No.30 = 363 item No.31 = 364 item No.32 = 365 item No.33 = 366 item No.34 = 367 item No.35 = 368 item No.24 = 357 item No.37 = 370 item No.38 = 371 item No.39 = 372 item No.40 = 373 item No.41 = 374 item No.42 = 375 item No.43 = 376 item No.44 = 377 item No.45 = 378 item No.46 = 379 item No.13 = 346 item No.2 = 335 item No.3 = 336 item No.4 = 337 item No.5 = 338 item No.6 = 339 item No.7 = 340 item No.8 = 341 item No.9 = 342 item No.10 = 343 item No.11 = 344 item No.12 = 345 item No.1 = 334 item No.14 = 347 item No.15 = 348 item No.16 = 349 item No.17 = 350 item No.18 = 351 item No.19 = 352 item No.20 = 353 item No.21 = 354 item No.22 = 355 item No.23 = 356
limit stuff 0 -> 9
item No.1 = 334 item No.2 = 335 item No.3 = 336 item No.4 = 337 item No.5 = 338 item No.6 = 339 item No.7 = 340 item No.8 = 341 item No.9 = 342 item No.10 = 343
limit stuff 10 -> 19
item No.12 = 345 item No.1 = 334 item No.10 = 343 item No.9 = 342 item No.8 = 341 item No.7 = 340 item No.6 = 339 item No.5 = 338 item No.4 = 337 item No.3 = 336
limit stuff 19 -> 29
item No.12 = 345 item No.11 = 344 item No.10 = 343 item No.9 = 342 item No.8 = 341 item No.7 = 340 item No.6 = 339 item No.5 = 338 item No.4 = 337 item No.3 = 336
I'm not convinced this is a SilverStripe issue. The way that MySQL resolves an ambiguous sort is highly unpredictable. See (https://dba.stackexchange.com/questions/6051/what-is-the-default-order-of-records-for-a-select-statement-in-mysql).
MySQL sorts the rows any way it wants to, without any guarantee of consistency.
That I can't replicate it should be an indication that we're dealing with some kind of unknown. It could be as simple as differing storage engines, MySQL versions, or both. In short, don't use ambiguous sorts.
Hi Aaron,
I'm not convinced this is a SilverStripe issue.
It is definitely a MYSQL issue - but we can't change MYSQL so we will have to deal with the weirdness in Silverstripe.
In short, don't use ambiguous sorts.
That is true in theory, but I don't think that is true practice, most of the time you are sorting by an ambiguous sort field: e.g.
etc...
Anytime they are not unique, you could end up with this problem. So, I reckon we should fix this, for the following reasons:
(1)
I am sure that most default_sort
entries, such as SiteTree do not sort by a database enforced unique key (as per above). If any of them are displayed in a paginated list (e.g. a gridfield, search results, filtered list, etc... etc... ) there is a real chance that some will show up twice on different pages and that other items will never show.
(2) There is an easy fix - add an ORDER BY BaseTable.ID to the end of all SQL Sort statements
(3) The danger with this bug is that no one will notice until it has real life consequences. There is no error, these lists seem to operate fine, etc...
(4) A perfect developer should remember to add the Sort By ID to the end of all sort statements that do not include an unambigious field - but if (a) you always need to add this no matter what, AND (b) few developers actually do this in practice - then it appears to me that the Silverstripe Core should take care of this.
I could replicate this behaviour on my lappy:
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.7.19 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1,TLSv1.1 |
| version | 5.7.19-0ubuntu0.16.04.1 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+-------------------------+
and on our server
+-------------------------+------------------------------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------------------------------+
| innodb_version | 5.6.34-79.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1.1,TLSv1.2 |
| version | 5.6.34-79.1 |
| version_comment | Percona Server (GPL), Release 79.1, Revision 1c589f9 |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
+-------------------------+------------------------------------------------------+
and here is my work machine (issue is also happening here):
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.7.19 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1,TLSv1.1 |
| version | 5.7.19-0ubuntu0.16.04.1 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+-------------------------+
sunnysideup/ecommerce_merchants | allow many businesses to sell within one shop. | 0.00 per day sunnysideup/ecommerce_modifier_example | provides an example of a modifier to help you create your ow... | 0.00 per day sunnysideup/ecommerce_mybusinessworld | This module adds a basic connection between silverstripe e-c... | 0.00 per day sunnysideup/ecommerce_nz_connectivity | New Zealand Post - Client for the rate finder web API for Si... | 0.00 per day sunnysideup/ecommerce_product_variation_colours | add colours to your silverstripe e-commerce product variatio... | 0.00 per day sunnysideup/ecommerce_repeatorders | silverstripe-ecommerce_repeatorders | 0.00 per day
Here is another test I ran:
Page:
function requireDefaultRecords()
{
for($i = 1; $i < 400; $i++) {
DB::alteration_message('Creating Pages: '.$i);
$filter = ['Title' => 'Page No. '.$i];
$page = Page::get()->filter($filter)->first();
if(! $page) {
$page = Page::create($filter);
}
if($i > 10) {
$parentPage = Page::get()->filter(array('ID:LessThan' => 10))->sort('RAND()')->first();
if($parentPage) {
$page->ParentID = $parentPage->ID;
}
} else {
$page->ParentID = 0;
}
$page->writeToStage('Stage');
$page->publish('Stage', 'Live');
}
}
Page_Controller:
public function UnpaginatedPages()
{
return Page::get();
}
public function PaginatedPages()
{
$list = Page::get();
$list = new PaginatedList($list, $this->getRequest());
$list->setPageLength(395);
return $list;
}
Page.ss
<% loop PaginatedPages %>
<li>$Title</li>
<% end_loop %>
<% if $PaginatedPages.MoreThanOnePage %>
<% loop $PaginatedPages.Pages %>
<% if $CurrentBool %>
$PageNum
<% else %>
<% if $Link %>
<a href="$Link">$PageNum</a>
<% else %>
...
<% end_if %>
<% end_if %>
<% end_loop %>
<% end_if %>
Pages show up in random order rather than the order they were added because the Sort
value for all pages is 1:
Having all the Page.Sort values at 1 means that it is very likely this bug occurs.
(2) There is an easy fix - add an ORDER BY BaseTable.ID to the end of all SQL Sort statements
I really think this is dangerous scope creep for the ORM, and it sets a really bad precedent. The ORM is an abstraction layer between you and SQL. It's primary purpose is to make querying more succinct, more secure, and more declarative. It is not the ORM's job to fix your bugs. It should be very possible to write bad queries with the ORM, and sorting by something you know to have a singular value is one of those areas.
(a) you always need to add this no matter what
Personally, I've never had to do this. Keep in mind, you're testing a very extreme case, where all the values of a column are equal, and in any real-world scenario, it would be apparent to the developer that he was sorting on a column with uniform values. If, by contrast, I sort by Title ASC
, and 2/100 records have the same title, I'm not sure I've ever noticed any consequences for the logic MySQL uses to resolve that conflict. If I did, I'd add a second column.
The best practice, in these scenarios, is to encourage developers to implement composite default_sort values for their models, if they are aware that their sort column is not unique. I concur with @unclecheese on this.
E.g.
private static $default_sort = '"BaseTable"."Sort" ASC, "BaseTable"."ID" ASC';
These can also be specified via array syntax.
private static $default_sort = [
'Sort' => 'ASC',
'ID' => 'ASC'
];
I believe the array syntax supports ORM to SQL column mapping under the hood, so that might be preferable to raw SQL fragments. :)
Shall we make the outcome of this problem a documentation update?
I agree, documentation to encourage this would be the best outcome for this...
I don't believe enforcing an ID
sort at the end of all queries which uses a sort is a good idea.
The SiteTree.default_sort
also not having ID
doesn't mean it's best practice :) in fact if anyone created a pull request to add ID
in for SiteTree
, I think it'd be a welcomed change.
The best practice, in these scenarios, is to encourage developers to implement composite default_sort values for their model
@tractorcow
What about SiteTree.Sort
and Blog:
https://github.com/silverstripe/silverstripe-blog/blob/master/src/Model/BlogPost.php#L134-L139
Both models are bound to have non-unique sort values.
We could add ID to those models exclusively, but that's still preferable to baking it into datalist for all classes globally.
On an e-commerce website we offer discount vouchers. We had set the default_sort to EndDate DESC, StartDate DESC
as that made sense at the time. In the modelAdmin that worked great when we tested it.
Then the client adds a whole bunch of discount vouchers with the same start and end dates.
When he is trying to delete them, there are around 8 / 46 that are not showing up in the ModelAdmin (and another 8 that show up twice) ...
Member List for Club, Ms Smith can not find herself in list, calls admin, admin finds her on page three of list, sends her a screenshot and a link to page three. Ms Smith can not find herself on that page (there are a few Smith people, list is sorted by last name).
I am not married to a solution, but it seems to me that this is definitely something we should improve for our users because they have much to loose and nothing to gain from Mysql seemingly random sorts.
I can see the following options:
add ID sort to all ORM get calls that do not use a non-unique Sort Field.
add ORDER BY ID ASC
at the end of any ORDER BY statements to all Mysql select statements. If we could work out when / what versions are affected then we could limit it to those versions?
provide a yml option for (2)
improve documentation about this
add ID sort to PaginatedList and GridField pagination
SiteTree.Sort
isn’t supposed to be unique, is it? The sort is relative to the parent page
@kinglozzer no, SiteTree.Sort
is not supposed to be unique, but what I found is that when you sort by a non-unique field (e.g. SiteTree / BlogPost - and 100s of other DataObjects) then pagination (including GridField pagination) can do pretty crazy things as Mysql is rather unpredictable in its sortorder when there are non-unique values (i.e. it sorts correctly, but rows with the same values show up in a random order). This means that some items show up twice on separate pages and other items never show up at all.
Here is my code as a repository:
https://github.com/sunnysideup/silverstripe-sort-test
I did some further testing. It does not happen with pages, it only happens with DataObjects.
I think this is really getting into a philosophical discussion on what is the role of framework/ORM. That seems to be where we're divided. My thoughts:
If you assign a singular sort clause to a DataObject for a column that is not entirely unique, and it produces unexpected results, that's on you as the developer who wrote that code. To expect the framework to jump in and save you is undue, for the same reason the template parser doesn't close your </p>
tags.
If we were to implicitly put ID ASC
in the sort of every DataList
, we would be punishing every developer who went about it the right way, and used unique columns, or composite clauses. There's a performance impact for having multiple sort columns, and it's nothing I would want to impose on every query without a full opt-in from the user. If I'm sorting on something almost guaranteed to be unique, like Created
, I don't need or want ID
in the mix as well, and if I have to write special code to tell the framework not to do that, that creates a really suboptimal developer experience.
Here are some alternative ideas:
Also - Created
is often not unique (e.g. import of data - import through third-party API - high volume sales).
I think firstly that the bug needs more research (exactly when does it happen and how / why).
Both GridField
and PaginatedList
are ORM agnostic. They're designed to work only with instances of SS_List
, so injecting concerns over ID
into either of them is inappropriate.
The fact that many users would be punished by this safety feature is really a non-starter for me. I don't think it aligns well with the core values of the product.
I tried sorting by Created
- same issue (most objects are created within one second).
That's not the point, though. In a real-world scenario where you were using data that was all generated programatically in a matter of milliseconds, you would intuit the Created
column to be unreliable for sort, and use a different column, or a composite sort clause. What you can demonstrate in test code should not influence the decisions we make that affect the developer experience.
Developers are responsible for understanding the idiosyncrasies of their data, and to install guardrails in the framework to protect them from writing bad code sets a terribly dangerous precedent. If it is acceptable for the ORM to audit the efficacy of your sort clause (at the expense of performance), it logically follows that the framework is permitted to make all kinds of other overreach in the name of developer hand-holding, like closing <p>
tags, preventing redirect loops, removing unnecessary queries, etc.
:-)
Thank you for all your attention to detail in this bug report.
These are my assumptions right now:
Reply from SS core team is: fix your own code, it is not good to punish best practice by adding a hack for sloppy code (e.g. always sorting by ID to help out lazy developers).
I think we need to test the assumptions a bit more first. I have set up a small repo other developers can see the bug in action: https://github.com/sunnysideup/silverstripe-sort-test/.
Thank you again for looking into all of this.
Have added an issue to the backlog. Closing as an actionable story has been created.
I just realised then when you sort by one the headers in a gridfield, you end up reproducing the same bug.
I am now sorting my dataobjects by ID
That works in the GridField:
But when I click on SameSame, you end up having some items that never show up and other items that show up twice, on two separate pages.
Page 2:
e.g. item 69
So back to the original user story... he / she opens the list of discount coupons, sorts it by start date (because that is an easy way for her / him to find it) and still ends up not finding the discount coupon, while others show up twice.
I get that this is an issue when you have duplicate field values. For me, the best practice when it was known that you could have duplicate values and it was important to have consistent and predictable ordering (particularly in the case of pagination), I was always certain to include a secondary sort field to reduce ambiguity at the database level. That said, I just wanted to point out:
... but sort by ID ASC/DESC should always be added as the last sort criteria, no matter what.
I would have to disagree with this, since it I think it potentially adds an extra level of magic ✨ which is the last thing you want in an already complicated ORM. What happens when the ORM incorrectly parses/interprets the user's intent in their $default_sort
property? How do we know they want older or newer records first (when there are dupes)? What happens if you need to use FIELD(...)
or RAND(...)
functions? I think it adds more issues than it solves, since:
@patricknelson, I agree with you in theory and thank you for posting, but in practice this is not always so easy or practicable. My thinking right now is that we should add a Sort by ID ONLY when you limit a set, but I have not really found a solution yet.
Have you considered that, for example, by default the GridField allows the user to sort by all Summary Fields that are database fields? Thus, when you sort by a non-unique value in the GridField this bug will raise its ugly head, even though your default_sort may have included the ID field.
That's GridField
, however, which allows you to select sort columns. I was replying to this (emphasis mine):
... but sort by ID ASC/DESC should always be added as the last sort criteria, no matter what.
... and then clarified that this shouldn't happen at the ORM (or framework) level when sorting in general. When you're in a GridField
, I think it might be fine to enable that feature (possibly by default) and then maybe allow disabling of it via some extra setting on your GridField
instance, if it gets in the way.
Hi Patrick,
Thank you for our reply. I think the first thing is really understand the issue, when does it happen, where does it happen, etc... ? I also jumped to the "solution" but I think the key is to understand the issue first. It does not seem to happen with all DataObjects, so I want to know, for example, what DataObjects are affected, when, etc... Its random presentation also makes it really easy to miss or not to worry about it where it may raise its ugly head when you least expect it.
Basically what I am saying: we are potentially presenting our users with bad data (some data multiple times, other data hidden) - this happens, for example, with GridField - out of the box. I think that is the crux. How we fix it is secondary (we may leave it to developers as suggested by core team). I want to fix it in all my sites, but I want to do it in a smart way.
It seems to me when you add DataObjects programatically (i.e. many at the same time) then it is more likely to happen. Also, I could not replicate it on Pages, why is that?
We've had a bit of a chat about this internally, and there's some consensus that a halfway measure to mitigate this is appropriate.
First, the condition of MySQL returning a non deterministic result set when ambiguous sort columns are used is absolutely a feature, not a bug. From the MySQL docs:
If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns. One factor that affects the execution plan is LIMIT, so an ORDER BY query with and without LIMIT may return rows in different orders.
While MySQL is providing the "expected result" (albeit non deterministic) in this case, it's fair to say DataList
is not. As a implementor of the Sortable
interface, DataList
is expected to sort deterministically. Further, it is the job of the ORM to obscure the "how" from the user, and create the list in a consistent, uniform way, whether you're using MySQL, Postgres, etc, or just plain ArrayData
.
All that said, to force ID
into the sort is still too much magic, but ensuring default_sort
is added seems like a reasonable halfway point. The problem is, with the direction being included as part of the default sort, it makes sort direction toggling (I.e. GridField column headers) unreliable. Therefore, a more sensible setting might be something more like default_sort_column
, with null being allowed to disable forced deterministic sorts.
Thoughts?
Good thoughts ;-) I agree wit hall of that.
Here are a couple of thoughts I had:
limit 10, 10
may return exactly the same as limit 0, 10
). Putting these two together ... whenever you use limit in an ORM call, you will have to sort by ID
as the last sort fragment (e.g. turn ORDER BY "StartDate" DESC
into ORDER BY "StartDate" DESC, "ID" ASC
).
So, what I would recommend is: whenever we limit a recordset in an ORM call, we add sort by ID as the last sort phrase (with the ability to turn it off) by default. This is how I would do it:
When putting together the final SQL for a database query - IF:
sort by ID
Research questions:
SELECT * FROM Table
AND SELECT * FROM Table ORDER BY ID
- for a table with a few records and a table with a ton of records...? (I can test this)So the answer to most of your questions/concerns is that it's not a bug or oddity with MySQL, or Postgres, or pages, or DataObjects. This is something that's encoded in the SQL standard. The server is free to return the rows in any order it wants when identical values are encountered. That means, potentially, all database platforms are affected, and the way that this issue may manifest itself in various contexts, be it with the application of certain clauses, implementation of / version of storage engines, or just plain phases of the moon, is irrelevant. The point is that the SQL standard in this case tells us the result is nondeterministic, and we have an API that promises to be deterministic, so we need to reconcile that conflict somehow.
I think if we were to add a default_sort_column
setting, it would have to be applied to all queries -- not just those that use LIMIT
. If you had page of results that was all-inclusive, with ambiguous sort, that URL now becomes non-cacheable, as the results could in theory be returned differently on subsequent page loads. The goal should be to make DataList
deterministic, not to negotiate the random behaviour of your database platform.
are we able to infer default_sort_column
from getting the PrimaryKey
type fields in DataObjectSchema::databaseFields()
?
Probably a bit too much magic for my taste, but it's an idea to bounce around.
Since I think this is just in reference to DataObject
s (where the default_sort_column
setting will exist), the ID
column should also always exist. Is it possible to change change a DataObject
's primary key? I don't think so, but if it is, then maybe that'd be a concern.
How does it work in PHPMyAdmin?
I think we need to do more research. I found that only some tables return with duplicates between limited segments of the table, while other tables do not (e.g. SiteTree).
A challenge I've considered with a default_sort_column is that there needs to be a way to tell a datalist to reverse it in the case where a non-default sort column (E.g. Title) is reversed, otherwise groups of rows with the duplicate Title column would not reverse. That makes the solution not able to be applied so transparently.
IMPORTANT
A:
SELECT * FROM `SiteTree` LIMIT 20, 10
B:
SELECT * FROM `SiteTree` LIMIT 20, 10 ORDER BY `ID`
C:
SELECT * FROM `SiteTree` LIMIT 20, 10 ORDER BY `sort`
NOT sorting and sorting by ID are equally fast, but sorting by Sort is significantly slower ... see: https://github.com/silverstripe/silverstripe-framework/issues/7272
This evidence shows that sort by ID vs no sort key has no performance impact and sorting by ID is faster, most of the time, than sorting by the default_sort.
Yeah it's not just in sorting in ASC
/DESC
but also if/when you're using functions to sort (e.g. FIELD(...)
or IF(...)
and maybe some others, mentioned his above). I don't know what the solution to this would be since this is known behavior, generally speaking. You could always suggest people include a secondary sort column in the meantime (separate issue, update doc's). Depending on how users apply sort, it may be possible to infer the direction if you apply ID as the second column. Often time it's just a second parameter passed to ->sort('Sort', 'DESC')
. It's when you get into custom sort queries that his coildnt be parsed reliably (you could try) but I think at that level the dev should have more control with less magic.
Ask: Yeah @sunnysideup, typically sorting by indexed fields is a bit faster than unindexed.
OK, there's about twenty screenpages of discussion here ;) If you care strongly about resolution, can somebody please summarise the state of the discussion with options going forward?
Summary:
Sortable
, DataList
should sort deterministically.ID ASC
into the query is too much magic. Something like default_sort_column
would be more appropriateimpact/low
is fitting.Forcing ID ASC into the query is too much magic. Something like default_sort_column would be more appropriate
I think we need to disambiguate the use (or the nomenclature) of default_sort_column
since we already have default_sort
which can define both one or more columns and their corresponding directions. As such, since we're addressing a specific problem (deterministic sorting) I think we should take a specific approach that makes it a bit more transparent, both in nomenclature and in implementation.
So, I'd suggest we do the following:
ID
as our secondary sorting column failover. These are all DataObjects
and, as such, all will contain this column.default_sort_column
(which is somewhat ambiguous in meaning), we use the name such as: sort_deterministic
, sort_secondary_id
, sort_failover
or similar.
default_sort_
prefix since we're not defaulting anything per se (as you'll see below). Instead, we're enabling/disabling a deterministic sorting algorithm at the database level, which (yes) does mean we have to inject extra fluff into the ORDER BY
clause and (yes) this setting is enabled by default. However, this is a boolean (noted below).true
.ID
sort failover by defaulting to ASC
in our algorithm and then parsing the existing order setting in our query prior to final execution for a direction and, if not blank/ASC, we can then set it to DESC
.sort_deterministic
setting and set it to false
.ID
may be redundant for several reasons but not necessarily negative or have an impact on performance.ID
even if a secondary column is manually provided, we may still want to do a quick de-dupe check to ensure this ID
column isn't already specified (to prevent accidentally overriding a contrary direction, e.g. DESC
instead of ASC
).Beyond the uncertainty of those two notes above on automatically disabling, I think we'd be set. Thoughts?
I agree with @patricknelson that having another config setting called default_sort_something
is ambiguous and conflicts with the existing default_sort
, but the idea is the same - to have a failover column to enable deterministic sorting.
Having ID
as a default for whatever this setting is called is a good idea, as all DataObjects have this (maybe rare exceptions).
If the idea can be kept simple such as a single config private static $determined_sort = ['ID' => 'ASC'];
and that can be extended/disabled as desired by developers through php or yml that would be good - yes you can set a config to false
or null
in yml in ss4.
I think some research may be useful:
what are the consequences of adding a sort by ID ASC to all sql statements (after any other sort statements - if any), so that you always have a predictable outcome? This would be my solution and apart from performance reductions, I can't see any other reason why this is not a good solution. [We may consider only doing this lists that are "limited").
I noticed that dataobjects do sort randomly but pages do not and I wonder why that is. It would be worthwhile to work out why (maybe to do with joins?)
Also - while no one has noticed this issues, this is not a benign one. At least two of my clients noticed it (and were unable to edit data because of it) in gridfields (where I had set the sort order to a field that was something like a date field).
Adding an index to SiteTree.Sort
should be a no-brainer AFAIK because it will make all retrievals of Pages faster (whenever there is no custom sort) - see: https://dba.stackexchange.com/questions/11031/order-by-column-should-have-index-or-not.
SUMMARY:
Pagination does not work well when the data is provided by MySQL 5.6 / 5.7 (php 7.1) where the data-set is sorted by a field (or fields) that are not enforced to be unique on a database level. The reason for this is that MySQL may return datasets in a random order (beyond what it has been asked to sort by).
You can possibly replicate this bug by yourself by using the tools listed below:
This means, for example, that:
There are still a lot of questions about this bug, but I have been able to consistently demonstrate this error on several machines with clean installs of 3.6.1
Details
the issue: when the default_sort value for a dataobject contains non-unique values for one or more rows then the GridField ends up NEVER showing some records while showing other records more than once:
How to replicate:
install SS 3.6.1 using standard installer
add the following dataobject (
MyDataObject
):then add modeladmin (
MyModelAdmin.php
):You get the following results:
For my client, I was sorting a GridField by EndDate and StartDate. Some of the items has the same EndDate and StartDate and thus were missing. Making it impossible for them to be found by the client.