yiisoft / active-record

Active Record database abstraction layer
https://www.yiiframework.com/
BSD 3-Clause "New" or "Revised" License
69 stars 29 forks source link

ActiveQuery::one() limiting database records #58

Open hvta opened 7 years ago

hvta commented 7 years ago

What steps will reproduce the problem?

ActiveQuery::one() doesn't seem to limit records in SQL.

What is the expected result?

Queries created with ActiveQuery::one() should have a LIMIT 1 clause in the prepared SQL.

What do you get instead?

The function selects all rows from a table resulting in excess memory consumption.

Additional info

Q A
Yii version 2.0.?
PHP version
Operating system

Funding

Fund with Polar

rob006 commented 7 years ago

See #9578

rob006 commented 7 years ago

@samdark Maybe it is worth add some shortcut for that, like ActiveQuery::first()? Just see how many duplicates have #4348 or #9578.

samdark commented 7 years ago

Yeah... maybe.

MysteryDragon commented 7 years ago

As I see there were some attempts: #12563

SamMousa commented 7 years ago

I looked into some of the old issues but couldnt find the answer to this question: What's the exact use case for selecting one record without a limit?

bizley commented 7 years ago

I think this is the potential problem - https://github.com/yiisoft/yii2/issues/4348#issuecomment-49405392

SamMousa commented 7 years ago

So can we create a specific test case to see if that's still true anno 2017?

Basically if that happens it is a bug in MySQL. The whole point of a dbms is to tell it what I want using SQL not how I want it...

If I only want one record I should tell the database and it is its responsibility to figure out the most efficient way to do that.

MysteryDragon commented 7 years ago

Pay attention on https://github.com/yiisoft/yii2/issues/4348#issuecomment-49419664 also. (And I'm not expert in MSSQL/Oracle at all.)

samdark commented 7 years ago

@SamMousa a test would be helpful.

SamMousa commented 7 years ago

Looking into that now. It could have been this bug: https://bugs.mysql.com/bug.php?id=78325

I'm unable to get mysql to pick the wrong index when using 2 tables with 50 million records, then joining them with a limit of 1 and different sorting (both on index and non-index columns).

I imagine that if your limit is high and you use a sort / group at some point mysql decides that it will be too big to store in memory and uses on disk storage. This is however not related to the limit clause but to the sorting.

@samdark It's impossible to prove a negative so I won't try ;-)

samdark commented 7 years ago

MSSQL:

mssql_select

mssql_select_limit

samdark commented 7 years ago

So it seems it doesn't matter for MSSQL... @sergeymakinen you're good with MSSQL. Any idea?

sergeymakinen commented 7 years ago

Most of the time (99%) it’s not an issue for MSSQL anymore. Especially in the yii\db\mssql\QueryBuilder::newBuildOrderByAndLimit() case. I consider limit(1) safe to be used in one(). I can neither confirm nor deny the same for Oracle but in recent versions (10.2 or higher) it should not be an issue.

ghost commented 7 years ago

For mysql, you can have some performance issues by combining limit and order by: https://github.com/yiisoft/yii2/issues/10869

I suggest to put ->one(true|false) to add the limit or make a global parameter.

SamMousa commented 7 years ago

@juanmacias How is that related to limit? The linked issue seems to be related to index hinting..

ghost commented 7 years ago

When using limit and order by, MySQL some times don't use the appropiate index, so the query will kill the server.

For example Product::find()->orderBy('name')->limit(1)->one();

MySql will do in some cases a full scan.

So, if you want to introduce limit 1, you have to allow index hinting.

In my case, I moved to PostGreSQL tired of this....

SamMousa commented 7 years ago

Do you have a sample SQL file? I know this occurs but it's hard to recreate consistently... Therefore it's hard to check if it still happens in the latest versions

On Apr 3, 2017 8:44 PM, "Juan Macias" notifications@github.com wrote:

When using limit and order by, MySQL some times don't use the appropiate index, so the query will kill the server.

For example Product::find()->orderBy('name')->limit(1)->one();

MySql will do in some cases a full scan.

So, if you want to introduce limit 1, you have to allow index hinting.

In my case, I moved to PostGreSQL tired of this....

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yiisoft/yii2/issues/13875#issuecomment-291235593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhYzSaHxyZ1ZWnpSpPrqWEMm9d8yUoWks5rsT37gaJpZM4MtAv_ .

ghost commented 7 years ago

It's a known issue with a lot of complains, stackoverflow is full of samples.

https://www.percona.com/blog/2006/09/01/mysql-order-by-limit-performance-optimization/

But this not only occurs with limit and order by, also with left join + where, and other querys.

If you have to deal with large tables and complex querys, force index is a must.

cebe commented 7 years ago

@juanmacias that article is from 2006. do you have more recent resources?

I did not find performance related things but there is this section from the mysql docs:

One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html

That means that automatically applying limit would result in behavior change in some cases.

ghost commented 7 years ago

Still present in MySQL 5.6 version (I have a lot of issues), more issues in stackoverflow:

http://stackoverflow.com/questions/9641463/mysql-not-using-index-for-order-by http://stackoverflow.com/questions/39954181/mysql-order-by-takes-a-very-long-time-even-if-i-have-indexes

As I told before, MySQL execution plan is not good in some stupid queries, that's the reason they implemented Force Index.

On Tue, Apr 4, 2017 at 11:04 AM, Carsten Brandt notifications@github.com wrote:

@juanmacias https://github.com/juanmacias that article is from 2006. do you have more recent resources?

I did not find performance related things but there is this section from the mysql docs:

One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html

That means that automatically applying limit would result in behavior change in some cases.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yiisoft/yii2/issues/13875#issuecomment-291438789, or mute the thread https://github.com/notifications/unsubscribe-auth/AEk1ZeOQsICs5BdYSQddrnLXAp77gVI0ks5rsge1gaJpZM4MtAv_ .

-- Juan Macias CEO QaShops.com 659203561

pvolyntsev commented 6 years ago

Still open?

ghost commented 6 years ago

I think so....

samdark commented 6 years ago

It is. It looks like a simple thing at the first sight but in reality it isn't. See comments above...

pvolyntsev commented 6 years ago

It's quite simple as I see

yiisoft\yii2\db\ActiveQuery.php

namespace yii\db;

class ActiveQuery
{
    /**
     * Executes query and returns a single row of result.
     * @param Connection|null $db the DB connection used to create the DB command.
     * If `null`, the DB connection returned by [[modelClass]] will be used.
     * @return ActiveRecord|array|null a single row of query result. Depending on the setting of [[asArray]],
     * the query result may be either an array or an ActiveRecord object. `null` will be returned
     * if the query results in nothing.
     */
    public function one($db = null)
    {
        $this->limit(1); // limit the query
        $row = parent::one($db);
        if ($row !== false) {
            $models = $this->populate([$row]);
            return reset($models) ?: null;
        }

        return null;
    }
}
MysteryDragon commented 6 years ago

@pvolyntsev it's not about code implementation, it's about pitfalls.

SamMousa commented 6 years ago

The question remains if this is really still an issue anno 2018; since this is not a yii bug but a supposed DBMS..

rugabarbo commented 6 years ago

I really like @rob006's proposal: https://github.com/yiisoft/yii2/issues/13875#issuecomment-290261535 Let's make two separate methods:

  1. ActiveQuery::first()
  2. ActiveQuery::one()

Does anyone mind? And if so, why?

SamMousa commented 6 years ago

I don't think the difference is clear from the method name. Note that this is all about clarity; currently the workaround is:

$query->limit(1)->one()

Which is not that much longer than $query->first(). The issue arises from the fact that people expect one() to be memory efficient.

Adding a new function will not remove any unclarity. If we really want to add another function i'd go with something like ActiveQuery::oneWithLimit(), that way you get to see it when using code completion and looking for one() and it conveys the meaning more clearly as well.

That said, I still think we should investigate if the underlying issue is still valid; I have not been able to reproduce it. Also DBMSes have improved over the years and their query optimizers I'd assume are less likely to make these kinds of stupid mistakes...

c4ys commented 6 years ago

In Mysql:

Don't use this:

// select * from order where user_id=:user_id
$order = Order::findOne(['user_id'=>$user_id])

Must use this:

// select * from order where user_id=:user_id limit 1
$order = Order::find(['user_id'=>$user_id])->limit(1)->one()
SamMousa commented 6 years ago

Note that in most cases you'll query using the primary key or another unique set when using findOne(), so then it doesn't really matter..

rob006 commented 6 years ago

We could remove AR::findOne() and go back to AR::findByPk(scalar $id) and AR::findAllByPk(scalar[] $ids). AR::findOne() seems to be to magic and people either expect more magic (https://github.com/yiisoft/yii2/pull/13816) or they do not realize how magical it is (https://www.yiiframework.com/news/168/releasing-yii-2-0-15-and-database-extensions-with-security-fixes#affected-code).

samdark commented 6 years ago

That said, I still think we should investigate if the underlying issue is still valid; I have not been able to reproduce it. Also DBMSes have improved over the years and their query optimizers I'd assume are less likely to make these kinds of stupid mistakes...

πŸ‘

beroso commented 6 years ago

I'm wondering if the method ActiveQuery::one could be changed to something like this:

    // in yii\db\ActiveQuery
    public function one($db = null)
    {
        // Conditionally add limit 1 clause
        if (!empty($this->where)) {
            $class = $this->modelClass;
            $pks = $class::primaryKey();
            // Check if all pk are used in where. If not, use LIMIT 1. (Maybe we can test for unique indexes too)
            if (!empty(array_diff(array_values($pks), array_keys($this->where)))) {
                $this->limit(1);
            }
        }
        else {
            $this->limit(1);
        }

        $row = parent::one($db);
        if ($row !== false) {
            $models = $this->populate([$row]);
            return reset($models) ?: null;
        }

        return null;
    }

But I'm not sure about the drawbacks.

uaoleg commented 1 month ago

Fixed https://github.com/yiisoft/yii2/pull/20266