Open elharo opened 7 months ago
Would this page be where to document this? http://prestodb.io/docs/current/language/reserved.html
Possibly. I need to think this through in a little more detail. @rschlussel and I were chatting about this over coffee this morning and I wanted to capture a quick note so we don't lose track of it. First we need to figure out what, if anything, the grammar currently says about these names. Reserving $names for system purposes is a common pattern, Java does this for example, but I don't know if it's standard in SQL.
@steveburnett @elharo @tdcmeehan
sqlbase.g4 its defined as non-reserved.
To add name as a reserved keyword in SQLBASE.G4 file, we should remove it from the nonReserved list and add a note mentioning that $name is reserved. Here's how you can do it:
nonReserved // IMPORTANT: this rule must only contain tokens. Nested rules are not supported. See SqlParser.exitNonReserved : ADD | ADMIN | ALL | ANALYZE | ANY | ARRAY | ASC | AT | BERNOULLI | CALL | CALLED | CASCADE | CATALOGS | COLUMN | COLUMNS | COMMENT | COMMIT | COMMITTED | CURRENT | CURRENT_ROLE | DATA | DATE | DAY | DEFINER | DESC | DETERMINISTIC | DISABLED | DISTRIBUTED | ENABLED | ENFORCED | EXCLUDING | EXPLAIN | EXTERNAL | FETCH | FILTER | FIRST | FOLLOWING | FORMAT | FUNCTION | FUNCTIONS | GRANT | GRANTED | GRANTS | GRAPHVIZ | GROUPS | HOUR | IF | IGNORE | INCLUDING | INPUT | INTERVAL | INVOKER | IO | ISOLATION | JSON | KEY | LANGUAGE | LAST | LATERAL | LEVEL | LIMIT | LOGICAL | MAP | MATERIALIZED | MINUTE | MONTH | NFC | NFD | NFKC | NFKD | NO | NONE | NULLIF | NULLS // 'NAME' removed from here | OF | OFFSET | ONLY | OPTION | ORDINALITY | OUTPUT | OVER | PARTITION | PARTITIONS | POSITION | PRECEDING | PRIMARY | PRIVILEGES | PROPERTIES | RANGE | READ | REFRESH | RELY | RENAME | REPEATABLE | REPLACE | RESET | RESPECT | RESTRICT | RETURN | RETURNS | REVOKE | ROLE | ROLES | ROLLBACK | ROW | ROWS | SCHEMA | SCHEMAS | SECOND | SECURITY | SERIALIZABLE | SESSION | SET | SETS | SQL | SHOW | SOME | START | STATS | SUBSTRING | SYSTEM | SYSTEM_TIME | SYSTEM_VERSION | TABLES | TABLESAMPLE | TEMPORARY | TEXT | TIME | TIMESTAMP | TO | TRANSACTION | TRUNCATE | TRY_CAST | TYPE | UNBOUNDED | UNCOMMITTED | UNIQUE | UPDATE | USE | USER | VALIDATE | VERBOSE | VERSION | VIEW | WORK | WRITE | YEAR | ZONE ; And add a note:
// NOTE: $name is reserved
this is wat i understand do coorect me if wrong
Not quite what I meant. I'm not trying to change the grammar. Just warn devs that if they name a column $Foo then they might have trouble in the future if presto uses $Foo for some synthetic column or other purpose.
That is, "Yes, it is technically possible to name a column $foo but only presto dev team should do this."
This is analogous to how Java and XML treat $name. Allowed but reserved. The way the XML spec phrases this is "Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification."
The Java Language spec says, "The $ sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems."
Obviously these aren't exactly analogous, but this is the spirit I'm trying to capture.
@elharo is there any reference to this limitation in the SQL spec? If not, then I'd note columns of the form $name
are just a per-connector convention, and connectors may actually specify a form any way they want. E.g. it could be @name
or !name
as well.
@elharo is there any reference to this limitation in the SQL spec? If not, then I'd note columns of the form
$name
are just a per-connector convention, and connectors may actually specify a form any way they want. E.g. it could be@name
or!name
as well.
Do we already have other connectors that use different conventions. If not, I think it would make sense to be consistent across connectors so people don't need to think about it (it's not the kind of thing where it matters tremendously what convention you use, as long as there is some convention so you don't break people's tables when you add a new kind of synthetic column). We could specify this convention just for hive (or maybe hive and iceberg, since I know iceberg uses this convention at least sometimes), but if there's no reason to limit, I think it's better not to.
@rschlussel the trouble is there's nothing to enforce any particular convention, so it would need to be added to the engine somehow. We can enumerate all open source connectors and look at what convention they use, which is most likely $name
, but we don't know who has built a custom connector over the past 10+ years that specifies it differently. I think the best we can do is just mention it as a guideline in our developer docs for connectors.
yeah, i don't think we can make any claims about custom connectors. My goal for this guideline is for the things we control so that going forward a) users know they use our reserved-ish names at their own risk and b) developers follow these guidelines so that in the future they don't introduce reserved column names that users didn't know to avoid.
Right. I don't want to enforce anything, or add anything to the engine. Just mention it as a guideline so developers get a heads up. I don't expect there are a lot of folks naming columns with dollar signs so this isn't a huge deal.
Sounds good--documeting that, by convention, $
prefixed names are often reserved for connector makes sense.
in docs