Open themsaid opened 6 years ago
Thanks for moving this to the appropriate location, @themsaid !
I should have a PR cooked-up pretty shortly. It looks like the nature of the required changes will be very minor in scope.
I will make every effort to maintain backward-compatibility, but that may come at the cost of a function whose name is a misnomer; that is, configureSchema()
when it really should be configureSearchPath()
. We could add a new function (two, actually) with a more apt name, but we would still have to change the calling reference to it (or implement some unsightly conditional logic).
In any case, I'll ensure that the PR is editable, so any prudent changes can be made.
Thanks again!
It looks like I'll be able to implement this in a way that is mostly backward-compatible (see Caveats, below), and without any specific dependencies between https://github.com/laravel/laravel and https://github.com/laravel/framework .
As such, it seems to make sense to submit this PR first, and once (and if) it's merged-in, adjust the connection configuration property name from schema
to searchpath
.
Basically, if the user's framework version includes this change and the key name is still schema
, the value will be ignored completely, in which case the effective search_path
will fall-back to PostgreSQL's chosen value (likely the default, which is "$user", public
).
Given that nobody has ever raised the concern that is the subject of this Issue, the vast majority of Laravel users who are using PostgreSQL are using the default (public
) schema, in which case this change will have no unintended consequences.
Potential Impact: Minor
If the database does in fact have a schema named "$user"
, which evaluates to forge
, for example, in a default Laravel installation. In such cases, it would be possible for object name ambiguity to arise (most likely a table name), in which case PostgreSQL will throw an appropriate exception. The stop-gap solution would be to prefix the table name(s) with the schema name(s), whether in the Model definition or elsewhere, thus preventing the ambiguity.
Potential Impact: Major
If the user is relying on one or more extensions registered on a schema other than those defined in the default search_path
('"$user", public'
), and specifies that schema as the connection's schema
value.
Upon receiving this update, the schema
value will be ignored (unless the user changes its name to searchpath
), thereby causing the extension(s) to become ineffective for that connection.
As I described exhaustively in my OP, this could have grave consequences for a given use-case that relies on one or more extensions.
Accordingly, I suggest weighing the BC-factor carefully before tagging/releasing, or instead merging this into a release that has other BC-breaks, and including accompanying documentation regarding the specific nature of the change (i.e., a link to this Issue).
1.) My intention is to implement support for variable references in the search_path
, such as the style that is used in PostgreSQL's default, "$user"
. Notice that the value must be quoted, whereas quoting is optional for static references. I don't see any immediate risk arising from this capability, given that these are core configuration values, and to modify them requires access to the effective user's environment, .env
, or config/database.php
. That said, I will make every effort to prevent unintended variable expansion, given that $
is, of course, of significance in PHP.
2.) In their native syntax, search_path
values are comma-separated, with spaces between items. It seems to make sense to retain the same format (with the spaces being optional) when specifying their values in Laravel. Using a string (instead of an array) to define the values also is conducive to specifying the values within the .env
file (which does not support array-style notation). If the intention is to allow for spaces (or any amount of whitespace, for that matter) within the declared values, at the Laravel layer, then some form of basic parsing/validation seems appropriate. To be explicit, my intention is not to change the existing if (is_array($searchPath)) { ...
logic, so that it is, in effect, the user's responsibility to call implode()
on the searchpath
string, e.g., in config/database.php
:
'searchpath' => !empty(env('DB_SEARCH_PATH')) ? explode(',', env('DB_SEARCH_PATH')) : 'public',
If that doesn't seem ideal for any reason, please let me know.
Initial PR soon to follow.
It appears that PostgreSQL accepts the search_path
string happily, with or without spaces, and then adds spaces to its own internal representation.
Also, the existing formatting method quotes the entire string (if the input is a string), which is not valid. For example
'schema' => 'public, mine',
yields
>>> DB::select('show search_path');
=> [
{#735
+"search_path": ""public,mine"",
},
]
which is not valid, and so will never be used (despite the fact that PostgreSQL does not complain). In other words, passing more than one schema in the string never worked correctly to begin with.
It seems likely that the intention was indeed to quote the entire string, to prevent any malicious or unintended SQL-related problems, but unless the user is aware that he must supply an array to use more than one schema in the search_path
, it's easy to "do it wrong" and use a string, in which case PostgreSQL basically ignores the value silently. Personally, I would rather experience a SQL syntax violation than a silent failure.
Passing an array into the same function, for example
'searchpath' => ['$user', 'public', 'mine'],
works as it should, because PostgreSQL handles the parsing, and interprets the value as such:
>>> DB::select('show search_path');
=> [
{#735
+"search_path": ""$user", public, mine",
},
]
So, at a minimum, it is necessary to remove the outer quotes when the supplied value is a string.
At a maximum, if the intention is to support "$user"
-style notation, in strings, too, further work is required (otherwise, a SQL syntax violation occurs).
My opinion regarding the matter is that the user should simply change the searchpath
value to an array if he needs to use "$user"
-style notation. I hesitate to get into the ugly business of string-parsing and manipulation in such a sensitive context; it feels as though there is too much go awry.
In fact, I'm inclined to err on the side of caution and leave the outer quotes intact when the value is supplied as a string. Maybe it is better to have a silent failure than any risk of a SQL-injection. I'll leave that bit alone. ;)
I apologize for the lengthy monologue here; more than anything, I needed to talk myself through the reasons for which I became so tripped-up.
In essence, my frustration and confusion stem directly from the fact that the existing code seems to conflate PostgreSQL's "schema" and "search_path" concepts. As tempting as it may be to write-off the conflation as a "mundane detail", it feels worthwhile to correct.
@themsaid I think it makes the most sense to close this Issue (at your convenience, of course), and I will open a new one with a more accurate subject and more relevant details.
Should I open it here (on laravel/internals
), too?
Thanks in advance!
Sorry @cbj4074 for bringing up old task, but it seems similar question.
We are currently researching different PHP frameworks to migrate to, but we have issue with "not mainstream" database usage:
So, the question is:
Thanks!
Unrelated, @cbj4074 it seems to be that this search_path issue will be fixed in 9.x via (your) https://github.com/laravel/framework/pull/35530 ?
@Talkless No apology needed. :) I'm not a proponent of closing or archiving threads (on any platform), nor do I find reviving them to constitute "poor etiquette", as long as the "bump" is relevant, and in your case, it most certainly is.
To answer your questions:
search_path
on the connection, in which case PostgreSQL's underlying value will be used. However, there are some small number of methods in which the search_path
is inferred if not set on the connection and not specified explicitly when calling the method, specifically because those cases require a value for it. As a specific example, Laravel's PostgreSQL Builder implementation defines a hasTable()
method, which queries PostgreSQL's information_schema
table to determine whether a specific table exists. Such a query is infeasible without knowing which schema
value to use in the WHERE
clause. Accordingly, sensible assumptions are made in these cases. While I'm not sure this will affect your use-case, and even if it does, you should be able to specify the required values explicitly at call-time, you may examine the relevant logic in https://github.com/laravel/framework/blob/ce57929931f877f03a8428498654591e890f55f8/src/Illuminate/Database/Schema/PostgresBuilder.php#L161-L223 .search_path
before executing a query, either by setting the value on the connection to be used before connecting, or you can set it on an established connection using a raw statement (which enables you to execute any arbitrary SQL statement against PostgreSQL, including SET search_path ...
).Ultimately, as of Laravel 9.x, I don't see any deal-breakers, given the use-case you describe.
@mfn Yes, the search_path handling is fixed in 9.x, but only partly via https://github.com/laravel/framework/pull/35530 . There are two additional PRs that fix lingering issues that arose subsequently: https://github.com/laravel/framework/pull/35567 and https://github.com/laravel/framework/pull/35588 .
@themsaid You can feel free to close this, given that it has been implemented/fixed in Laravel 9.x.
Ultimately, as of Laravel 9.x, I don't see any deal-breakers, given the use-case you describe.
Looks great, thanks @cbj4074 for detailed response!
Original post by @cbj4074
https://github.com/laravel/framework/issues/22333
Description
The author is of the opinion that Laravel manipulates PostgreSQL's
search_path
recklessly (though, with the best of intentions, I'm sure), given that it is possible to reference objects (e.g., tables) in other schemas in PostgreSQL.For a primer regarding schemas, which are essentially "namespaces" (a concept that MySQL, for example, lacks), see:
https://www.postgresql.org/docs/current/static/ddl-schemas.html
The manipulation in question occurs at the following location in the Laravel Framework (Illuminate) source:
https://github.com/laravel/framework/blob/5.5/src/Illuminate/Database/Connectors/PostgresConnector.php#L95
The specific manner in which Extensions are implemented in PostgreSQL compounds the problem even further.
The manner in which Laravel manipulates the
search_path
has far-reaching implications for every aspect of the DBAL and ORM; anyone who understands Extensions in PostgreSQL can imagine what happens when Laravel paves-over the effectivesearch_path
any time it opens a new database Connection.Imagine the following scenario:
The Laravel application uses a PostgreSQL database, in which
TEXT
data types are compared case-sensitively (this is PostgreSQL's default behavior).A user registers with the application, using the email address
JDoe@example.com
.To prevent a user's inability to authenticate using a variation in case, with respect to his email address, e.g.,
jdoe@example.com
, the application developer makes use of PostgreSQL's case-insensitive text comparison extension,citext
.For implementation details regarding extensions in PostgreSQL, see:
https://www.postgresql.org/docs/current/static/sql-createextension.html
The case for using
citext
instead ofLOWER()
on both sides of the comparison operator, for example, is articulated clearly in the following post:https://hashrocket.com/blog/posts/working-with-email-addresses-in-postgresql
Now, imagine that the application developer is using more than one schema in the application (after all, this capability is one of PostgreSQL's strengths over MySQL, for example). Imagine further that the developer defines Eloquent Models whose relationships extend across schema boundaries; that is, relationships are defined between models whose respective connections specify different schemas.
In this scenario, Eloquent (and Query Builder, for that matter) break-down completely with respect to how they interact with any PostgreSQL extensions that may be in use on the database in question.
More specifically, in many cases, extensions are enabled on the default schema (usually
public
), and as long aspublic
is in thesearch_path
, the database connection is able to take advantage of those extensions, regardless of the source and target objects' respective schema(s) (see Key Concepts, below, for a more detailed explanation). This is important because if Laravel paves-over PostgreSQL's effectivesearch_path
when establishing any connection (which is the current behavior), the database will perform completely differently where extensions are concerned, e.g., case-insensitive text comparisons can become case-sensitive in crucial scenarios. Imagine the havoc this could wreak in certain use-cases.To continue with the aforementioned email address example, imagine further that a second (and malicious) user registers with the application as
jdoe@example.com
. Because thecitext
extension is not effective, PostgreSQL will now treat the first and second users' email addresses as distinct. While the users would have distinctid
values, there are a number of junctures at which this behavior could pose non-trivial problems (e.g., password resets, SSO implementations, etc.).Yes, there are other fail-safes that should be employed to prevent this type of scenario (e.g., unique constraint), but not everybody is so shrewd. More than anything, my intention is to demonstrate that Laravel's current behavior can cause legitimate problems that are extremely difficult to debug.
Key Concepts
1.) Crucially, just because the extension is enabled on a particular schema, it will not be used unless that schema is also in the effective
search_path
.In other words, if the effective
search path
is, for example,"$user", public
, the extension will not be used when processing this query:2.) Equally crucial is that an extension is "accessible" by virtue of existing ("being created") in the effective
search_path
. This means that an extension can be enabled on thepublic
schema, but be used to effect change on a different schema. Consider the following scenario.We did not specify a schema in which to create the extension, explicitly, so the default schema,
public
, is used (I believe this default schema to be specified intemplate1
, which ships with PostgreSQL).And even though the
platform
schema is not in the effectivesearch_path
, we are able to use the extension against it (of course, this requires specifying the schema prefix when addressing the table):Likewise, we are able to execute queries that leverage the extension against schemas in which the extension does not exist:
Possible Solutions
1.) Do less; don't manipulate the
search_path
at all, and make it the developer's responsibility to set the value appropriately for his specific use-case, whether he decides to do that in PostgreSQL, or within his Laravel application.2.) At the very minimum, Laravel should never overwrite the
search_path
completely, primarily because thesearch_path
can be configured in the PostgreSQL configuration in a way that Laravel cannot replicate easily (if at all). If anything, Laravel should only add schemas to thesearch_path
, as needed; it should never remove schemas (and again, that includes overwriting them). A more elegant approach might be to have two separate connection properties forschema
andsearch_path
. I haven't experimented with that approach as yet, but it seems to have merit, and may solve this problem entirely.3.) While not ideal, it did occur to me to add every schema in my database to the
schema
value in the PostgreSQL connection configuration, e.g.but this is not a feasible solution for at least two reasons:
Laravel seems to treat the "schema" and the "search_path" interchangeably, the implications of which are far-reaching and not obvious in most cases; these are two distinctly different concepts.
Laravel employs parameter binding when it sets the
search_path
to the schema name, as part of the connection process:$connection->prepare("set search_path to {$schema}")->execute();
This causes the entire string to be quoted, which renders the
search_path
effectively invalid. For example, thephp artisan tinker
output demonstrates the problem:Furthermore, the binding precludes the use of dynamic values that PostgreSQL understands; a textbook example is the default
search_path
value:"$user", public
, which PostgreSQL evaluates ashomestead, public
, for example.Conclusion
I'm fully aware that this problem affects a tiny fraction of Laravel users. I did ask myself:
A back-of-the-napkin calculation yields about 1 in 1,000,000, which explains why nobody else seems to have documented nor addressed this limitation in Laravel's PostgreSQL implementation.
Nonetheless, it seems that if Laravel is to be regarded as "Enterprise-capable", this limitation must be addressed.
Thank you for your time and consideration.