laravel / ideas

Issues board used for Laravel internals discussions.
939 stars 28 forks source link

Attribute shadowing with join() encourages possible data leak bugs #347

Closed jonashaag closed 6 years ago

jonashaag commented 7 years ago

Please see https://github.com/laravel/framework/issues/16704, I believe the way duplicate column names (for example: users.id, groups.id when you join the both tables) are currently handled by Eloquent A) can be a surprise to many users and B) is a potential security issue.

Summary from the discussion linked above:

fernandobandeira commented 7 years ago

I think we should throw a duplicate select column exception for any duplicated column on the result...

For the implementation I think that the simpler would be to change the current FETCH_OBJ to FETCH_NAMED, if one of the keys has an array as the value (this implies that it has more than one value for the same column name), then throw the exception, it's better than magically doing something with the column, this is something the developer should take care anyway, I guess warning with an exception should be enough...

Also we could hide this exception when not working with strict mode on sql, instead we would select the first value of the array and use it, as it should probably be what you'll expect as the value...

Update: My exception idea was rejected so I think that we won't handle this situation at all, which isn't bad since it's something the developer should be taking care after all... I guess we can leave this open anyway so ppl can comment how other ORM's are handling this and propose alternatives... Also it keeps all discussion in one place...

jonashaag commented 7 years ago

which isn't bad since it's something the developer should be taking care after all

That's like saying it's unnecessary to create memory-safe programming languages because developers should not make mistakes with memory management in C, so why bother about memory safety in the first place? It's the programmer's fault if half of the world in vulnerable because of some memory management mistake in OpenSSL after all!

Or it's unnecessary to have static type checking; it's the programmer's mistake if the program has type errors!

I could name hundreds of other examples where smart people have decided to change how stuff works because other people tend to make too many silly mistakes. From a pragmatic point of view, if people make silly mistakes with your framework all the time, then you have simply made wrong decisions somewhere.

jonashaag commented 7 years ago

I would love to get some feedback on this issue.

Garbee commented 7 years ago

Can you please provide an actual example exploit that is possible because of this? All I'm seeing so far is the theory of a possible exploit, not a concrete example where one exists in a situation that would generally occur in an application.

jonashaag commented 7 years ago

There is no exploit that works for all applications. My point is that the current behavior encourages code that is prone to potential data leaks and bugs. One might argue that this is strictly speaking not a security issue but I think it's very similar to SQL injections though very limited in scope.

Garbee commented 7 years ago

As I said, can you please provide an example exploit? As it is, I don't quite understand what the exploit allows for an attacker to do. Or even how an attacker would exploit it if you aren't taking SQL commands from them. Helping get this looked at is difficult without a concrete "X report allows for Y attack to occur."

Garbee commented 7 years ago

Further, I didn't ask for an example that would affect "all", of course that isn't feasible. I simply asked for a "generally applicable" example, as in one that isn't caused by a developer explicitly opening an injection attack vector by taking user input and putting it directly into a SQL query.

jonashaag commented 7 years ago

There is no general applicable exploit. There is no exploit. This issue is about preventing BUGS that may or may not have security impact. I'd say it's very likely for bugs causes by this behavior to have security consequences (other people's data being shown to visitors).

Garbee commented 7 years ago

Can you provide a case in which this bug happens? I don't know how many times a code sample needs to be asked for.

jonashaag commented 7 years ago

Please see laravel/framework#16704

Very first sentence in this issue

Garbee commented 7 years ago

Yes, and that code doesn't make sense anyways. Your getting a collection of multiple users from each query. Then trying to scope orders by a single user. Which means that example is already failing in multiple places. It isn't clearly demonstrating the risk you're asserting exists.

jonashaag commented 7 years ago

I am sorry; here's an even more complete version of the code:

function getUsers() { ... }
foreach (getUsers()->get() as $someUser) {
    $someUserOrders = Order::where("user_id", $someUser->id);
    ...
}

Simpler example without a loop:

function getUserById($id) {
  return User::where("id", $id)->first();
}

A few months after coding this the project requirements change and we now have user groups as well, and we change the query to all users having a primary group :

function getUserById($id) {
  return User::where("id", $id)->join("groups", "groups.id", "=", "users.primary_group_id")->first();
}

Also we always had some code that fetches users' orders which worked like this:

Order::where("user_id", getUserById(...)->id);

The ->id property is different when you use the second getUserById method -- it's the ID of the group.

Note that this is a problem that cannot happen with raw SQL; it's a problem introduced by Eloquent, see first post in this issue thread.

Garbee commented 7 years ago

What about using User::with("PrimaryGroup")->find($id); instead of the manual join? Assuming "PrimaryGroup" is an association with the user model. Would that method still contain the group id instead of the user id for the call?

I feel like doing a manual join in this kind of scenario is a quirky edge-case for most Laravel users.

jonashaag commented 7 years ago

From a quick test: No, it only happens with join().

Thing is we have quite some places where we can't use with() for performance reasons, to the point where we stopped using with() for queries. Also WHERE statements using with() don't really filter the result set, they only null the related object if none is found. For example the "get me all users that have a primary group that exists in the database" can't even be expressed with with(), if I understand things correctly.

I guess loads of people must be using join(), given the restricted usefulness of with().

Garbee commented 7 years ago

So, if it is only with join... Isn't this how raw SQL will work when you run the commands? It seems to me like given context, this is expected output as long as doing a manual join works with way in SQL.

barryvdh commented 7 years ago

For example the "get me all users that have a primary group that exists in the database" can't even be expressed with with(), if I understand things correctly.

$query->has('PrimaryGroup')
jonashaag commented 7 years ago

Isn't this how raw SQL will work when you run the commands?

NO. Eloquent confuses the object's IDs. Let me make a very very concrete example.

var_dump(User::where('login', 'foo@bar.com')->first()->id);
int(269)

var_dump(User::where('login', 'foo@bar.com')->join('groups', 'primary_group_id', '=', 'groups.id')->first()->id);
int(261) // <-- wrong id!

Queries made by Eloquent are correct:

MariaDB [xxx]> select ... from `users` where `login` = 'foo@bar.com' limit 1;
+-----+------------------+
| id  | primary_group_id |
+-----+------------------+
| 269 |              261 |
+-----+------------------+
1 row in set (0.00 sec)

MariaDB [xxx]> select ... from `users` inner join `groups` on `primary_group_id` = `groups`.`id` where `login` = 'foo@bar.com' limit 1;
+-----+------------------+
| id  |               id |
+-----+------------------+
| 269 |              261 |
+-----+------------------+
1 row in set (0.00 sec)

But when constructing the model objects, Eloquent chooses THE WRONG ID COLUMN. (EDIT: Or, rather, it's ambiguous, and it choses one of them.)

jonashaag commented 7 years ago

Eloquent should be using aliases for all tables for all queries so that these ambiguities go away. The simplest way would be to something like select users.*, groups.* ... but that fails as soon as you have multiple joins with the same table. Numbered alias would work in that case: select t1.*, t2.*, t3.* from users as t1 join groups as t2 ... join groups as t3 ...

Garbee commented 7 years ago

So, this isn't really a security problem. It is a possible data leak issue. There isn't anything a user can really do here to force this and abuse it. It's like, if all the stars align just right and the developer implements a feature X way then the wrong data could be retrieved.

I'll try pinging Taylor to peek at this later today. But fixing this seems like it may be a slow churn and possibly a big 6.0 type fix since changing this could break existing applications quite heavily.

Thank you for the full examples, they helped a ton in understanding the exact problem!

jonashaag commented 7 years ago

🎉

Thanks for being patient enough and asking all the questions. I'll definitely have to work on my "how to file a good bug report" skills :-)

jonashaag commented 7 years ago

Also, I might be interested in coming up with the first patch for this issue.

Garbee commented 7 years ago

As far as bug reporting you got one part really well which is describing the technical side. However, that can be confusing which is where also providing a direct, workable code sample comes in. Beyond that, have a section that clearly describes a recommended solution and try to provide some information on foreseen implementation problems, (BC breaks, developer confusion, etc.) so there is some meat to be discussed on that end. It also works well when you use headings to show the sections of the issue and <details> elements to hide the code stuff away (or big images if you're showing a UX problem to maintainers.) That way the primary content as far as description is up front, and the code samples/auxiliary understanding bits are hidden away until the reviewer asks to see it.

Garbee commented 7 years ago

Also, this does cause an active exploit against sites if they use the query builder to join based directly on content sent from a client. If a user catches on and the proper checks aren't in place for the requested data, they can cause a force dump of targeted data out of the system. This is once again, "assuming all the stars align" in probability. Therefore, not a critical priority to get addressed. Still something we should help guard against if possible.

jonashaag commented 7 years ago

I just had a look at the Eloquent/query internals and oh boy, using unique tables aliases as I initially suggested for sure doesn't look like an easy undertaking.

IIUC Eloquent constructs model instances simply by taking SQL result row arrays and setting them as model attributes. For example if your SQL query returns an array ["a" => 1, "b" => 2] then your model will have these attributes, without even checking if these attributes are part of the model's table in the first place. (Anything else would be crazy from an implementation perspective.) But also there doesn't seem to be internal housekeeping of what select/join/... statements are part of the query other than the naïve list of clauses that are in "almost final SQL" form.

It's also horrible from a deprecation path point of view, as many have suggested in the discussion already. I guess lots of people rely on the fact that model instances may "receive" attributes from other tables, aggregates, raw queries, ..., and changing that could be painful even with a long deprecation window. Maybe something along the following?

Step 1 should be reasonable, maybe even step 2. Step 3 and 4 sound a bit crazy though...

Danonk commented 7 years ago

How am I to walkaround this bug? I want to retrieve a model, with a condition about it joined table. I want only those Movies (table "movies") that have a Director (table "directors") that are from USA.

$movie = Movie::query()
  ->where('name', 'Fightclub')
  ->join('directors', 'director.id', '=', 'movies.director_id')
  ->where('directors.country', 'USA')
  ->first();

Now $movie->id has an id of director! not the real id

sisve commented 7 years ago

I want only those Movies (table "movies") that have a Director (table "directors") that are from USA.

I'm not an Eloquent user, but what you explain is whereHas, not a join. The original issue only occurs if you write queries with joins manually, instead of letting Eloquent build the queries automatically. Eloquent never issues joins, afaik.

$movie = Movie::query()
    ->with('director')
    ->where('name', '=', 'Fight Club')
    ->whereHas('director', function($query) {
        $query->where('country', '=', 'USA');
    })
    ->first();
Danonk commented 7 years ago

Will it also eager load director record? Because of the with('director')?

sisve commented 7 years ago

Yes, it will eager-load all directors for any movie that has at least one director with country = 'USA'. In your case it looks like a movie only has one director. You can constraint the eager-loading further if required, like only loading directors with a certain shoe size. This is all in the documentation for Eloquent: Relationships, under Querying Relations and Eager Loading.

Danonk commented 7 years ago

I don't want eager loading of anything. Came up with this:

$movie = Movie::query()
  ->select('movies.*')  // this prevents attributes shadowing 
  ->where('name', 'Fightclub')
  ->join('directors', 'director.id', '=', 'movies.director_id')
  ->where('directors.country', 'USA')
  ->get();
cdarken commented 7 years ago

You can use whereHas() even without eager loading.

Danonk commented 7 years ago

I guess i'm gonna go with:

Movie::whereIn('id', function ($query) {
    $query->select('movies.id')
        ->from('movies')
        ->join('directions', /** ... */)
});
falcon-git commented 6 years ago

I was surprised by this behavior too. Left joined in a user table with my content table. Suddenly the updated_at was NULL since the content table's updated_at property were overwritten by the non existing user table entry. I would expect that the properties of the main table were kept for left joins and an exception or table prefix for join.

eoghan27 commented 6 years ago

yep - I would expect the original table columns to take priority. It makes no sense for a joined table to overtake original columns or for those additional columns to not have a prefix.

At least, lets get it in the docs for joins that we need to add a select on the join to define the column alias ourselves.

taylorotwell commented 6 years ago

This is how SQL works.

sisve commented 6 years ago

@taylorotwell

No, it isn't. This is perhaps how the Eloquent handles it, or how PDO handles it. But it's absolutely not how SQL works. This issue/idea/suggestion is about fixing/supporting it. A database query can easily produce a result with columns with identical names; SELECT id AS example, email AS example, ... and somewhere the last column returned overwrites the previous ones. This is done somewhere in Eloquent or PDO, not by the database.

Throwing an exception telling the developer that there are duplicate column names (and they are expected to be unique) is a better solution that just overwriting some values as is currently done.

jonashaag commented 6 years ago

@taylorotwell No, this is not "how SQL works."

The closest you'd get to something a query like

User::join('groups', 'primary_group_id', '=', 'groups.id')->first()->id

is the SQL statement

SELECT id from (SELECT * FROM users JOIN groups ON users.primary_group_id = groups.id)

which fails in MySQL and PostgreSQL with an error that the id column is ambiguous (MySQL: ERROR 1060 (42S21): Duplicate column name 'id').

Besides, what kind of argument is this? You've got a well-defined, real-world problem that multiple users have experienced, with possibly catastrophic outcomes. While the root cause may not be easy to fix, there's an obvious mitigation for the problems that can arise from it: Detect duplicate columns when doing an SQL query and throw a runtime error.

None of the ORMs I have ever used have had this kind of problem. It came as a nasty surprise to me: I consider protecting your users from shooting themselves into their own feet is a large part of an ORM's responsibility. (Unless users explicitly request that are protections be lifted, i.e. going to down raw SQL.)

How many more Laravel users are affected by this issue remains unclear; I expect only a very small subset of them to find this GitHub bug report.

eoghan27 commented 6 years ago

Totally agree.

At a minimum, put it in the docs to warn devs that this situation can happen. Thanks all.

tdondich commented 6 years ago

Why are we not being explicit in the table columns names when using the query builder? This would avoid this problem completely. And then on result in Eloquent, you map the column names from the table name and put them into the properties appropriately.

Specifying table names in the column selects would, ultimately, rest this case and still have SQL work the way it's designed.

sisve commented 6 years ago

@tdondich Because no one knows what columns exists in the database. Most queries are a simple SELECT * FROM model which means we cannot automatically alias them. And there's no metadata available to Laravel to build a list of all columns that the model uses.

jonashaag commented 6 years ago

FYI Ruby on Rails uses inspection to retrieve the list of columns. IIRC it does this during application boot and caches it (google the term Schema Cache).

sisve commented 6 years ago

Something similar is also available in Doctrine where you declare your properties on your model, and map them accordingly to database columns. This means that Doctrine knows of every database column and can alias them accordingly.

Laravel, however, does not have this functionality.

jonashaag commented 6 years ago

Of course it doesn't have it at this point, but that's a bad argument for not adding another feature that depends on it. The solution is simple: one could add the feature.

sisve commented 6 years ago

Sure. I misinterpreted the situation. I believe the current heading of this issue was to introduce exceptions when there's duplicate column names in the result.

There are also other issues that touches the subject of introducing schemas/model metadata for Laravel. https://github.com/laravel/ideas/issues/980 (Would a schema file make sense?) https://github.com/laravel/ideas/issues/944 (Database migration declarative approach per table, run migrate depending on the difference)

But in the end, if we want something that large, why not switch to Doctrine which has that feature already?

hulkur commented 6 years ago

similar issue as https://github.com/laravel/framework/issues/23548 and definitely not sql problem

to avoid duplicate column conflict relation() is defined as $this->hasManyThrough(...)->select('related.*')

$this->relation()->toBase()->get() returns id = 1 $this->relation->first() has id = 2

There is only one related model with id of 1 as returned by base sql, but somewhere while creating a related model the id is overwritten by intermediate model id

hulkur commented 6 years ago

Problem starts in HasManyThrough::shouldSelect where return array_merge($columns, [$this->getQualifiedFirstKeyName()]); adds another id column into select. If I alias it and then use the alias in buildDictionary then it works correctly

mfn commented 6 years ago

I've stopped counting how often I've been bitten by this.

Every join between tables of same-names columns without explicit selection of columns is prone to this problem.

This is one of the things CakePHP2 "got right" [1]:

That's also why, even in only a single "Model" (if you can even call it so) result, is always "nested":

$post = $Post->find('first');
// Equals this structure
$post = [
  'Post' => [
    'id',
     …
  ]
]

Under the hood:

This approach however is at odds with so many things how Eloquent works so pretty much unfeasible I think.

Fun fact: I recently had a performance issues with complex results and joins and there wrote exactly such a converter:

It felt wrong, but I couldn't find another way except doing everything manual.

There was also mentioned FETCH_OBJ vs. FETCH_NAMED: unless I'm wrong, FETCH_OBJ is essential to how Eloquent achieves great performance to hydrate the actual Models.

No solution to the problem, just providing some perspective.

[1] "got right" as in: does not suffer from this particular problem, doens't mean it's a saint with other things

HenkPoley commented 4 years ago

I guess you can "resolve" this by adding a global query scope. Untested [Doesn't work]:

class MyModel extends Model
{
    /**
     * A side-effect that is loaded when the model is 'booted', to only query the model its own columns.
     *
     * @return void We only did things in secret
     */
    protected static function booted()
    {
        static::addGlobalScope('only_my_columns', function (Builder $builder) {
            $builder->select($this->table . '.*'); // Edit: TODO: Wait a minute, you don't have access to $this in a static function.
        });
    }
}

For sure this means you need to remember to add extra columns of other tables, if you need them in one your models.

Edit: Also it doesn't work since there is no $this in a static function. I guess you could hack around this by making a public static $REAL_TABLE_NAME in the class, and loading that into $this->table during construction. And figure out Laravel Eloquent default table name if ..::$REAL_TABLE_NAME is not set.

"Hey Siri, what's an ORM?" 😅

jonashaag commented 4 years ago

I have written a workaround for this when I initially reported the issue that forces you to always explicitly select all the columns when doing joins. Can't find the code right now though.

Honestly, at this point I would just recommend to everyone to not use Eloquent in the first place. This issue has been largely ignored by the developers for over 4 years now, with lots of people suffering from exactly the symptoms I have proclaimed, and also the Eloquent developers themselves having to plug workarounds at other places in the code because of the issue.

Eloquent developers continue to claim that "this is how SQL works" here and in other places as well, while clearly proved wrong.

HenkPoley commented 4 years ago

Yeah, naming all required columns is a best practice in SQL (relational algebra): https://stackoverflow.com/a/3639964/273668

Sadly Eloquent didn't start that explicit way. Maybe it could be added as a kind of 'Eloquent2 is the future' to the side.

Preference to the columns of your Model its own table seems pretty sane default though 🤔.


Maybe someday someone will create one of these fancy SQL to typesafe interface/classes tooling for PHP 🤷‍♂️. So you can actually know what comes out.

zmilan commented 4 years ago

Hi,

I'm new to the Laravel community and didn't know about this thread so I have opened another issue where I have described something that surprised me totally and that is actually the topic of this thread.

This is actually a possible security issue and I have explained it in my comment. For a short explanation, if use users and tenant_users tables to authenticate the user and check is it belongs to the tenant, you can end up with a properly authenticated user which id is overwritten with the id from tenant_users which by accident could belong to some other record from the users table which has higher privileges. When you after that use that user model to load relations it would load them by id from that other user... (there is an example of code in issue)