prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

Add note that $name is reserved #22348

Open elharo opened 7 months ago

elharo commented 7 months ago

in docs

steveburnett commented 7 months ago

Would this page be where to document this? http://prestodb.io/docs/current/language/reserved.html

elharo commented 7 months ago

Possibly. I need to think this through in a little more detail. @rschlussel and I were chatting about this over coffee this morning and I wanted to capture a quick note so we don't lose track of it. First we need to figure out what, if anything, the grammar currently says about these names. Reserving $names for system purposes is a common pattern, Java does this for example, but I don't know if it's standard in SQL.

Akanksha-kedia commented 7 months ago

@steveburnett @elharo @tdcmeehan

Screenshot 2024-03-30 at 3 51 05 PM

sqlbase.g4 its defined as non-reserved.

To add name as a reserved keyword in SQLBASE.G4 file, we should remove it from the nonReserved list and add a note mentioning that $name is reserved. Here's how you can do it:

nonReserved // IMPORTANT: this rule must only contain tokens. Nested rules are not supported. See SqlParser.exitNonReserved : ADD | ADMIN | ALL | ANALYZE | ANY | ARRAY | ASC | AT | BERNOULLI | CALL | CALLED | CASCADE | CATALOGS | COLUMN | COLUMNS | COMMENT | COMMIT | COMMITTED | CURRENT | CURRENT_ROLE | DATA | DATE | DAY | DEFINER | DESC | DETERMINISTIC | DISABLED | DISTRIBUTED | ENABLED | ENFORCED | EXCLUDING | EXPLAIN | EXTERNAL | FETCH | FILTER | FIRST | FOLLOWING | FORMAT | FUNCTION | FUNCTIONS | GRANT | GRANTED | GRANTS | GRAPHVIZ | GROUPS | HOUR | IF | IGNORE | INCLUDING | INPUT | INTERVAL | INVOKER | IO | ISOLATION | JSON | KEY | LANGUAGE | LAST | LATERAL | LEVEL | LIMIT | LOGICAL | MAP | MATERIALIZED | MINUTE | MONTH | NFC | NFD | NFKC | NFKD | NO | NONE | NULLIF | NULLS // 'NAME' removed from here | OF | OFFSET | ONLY | OPTION | ORDINALITY | OUTPUT | OVER | PARTITION | PARTITIONS | POSITION | PRECEDING | PRIMARY | PRIVILEGES | PROPERTIES | RANGE | READ | REFRESH | RELY | RENAME | REPEATABLE | REPLACE | RESET | RESPECT | RESTRICT | RETURN | RETURNS | REVOKE | ROLE | ROLES | ROLLBACK | ROW | ROWS | SCHEMA | SCHEMAS | SECOND | SECURITY | SERIALIZABLE | SESSION | SET | SETS | SQL | SHOW | SOME | START | STATS | SUBSTRING | SYSTEM | SYSTEM_TIME | SYSTEM_VERSION | TABLES | TABLESAMPLE | TEMPORARY | TEXT | TIME | TIMESTAMP | TO | TRANSACTION | TRUNCATE | TRY_CAST | TYPE | UNBOUNDED | UNCOMMITTED | UNIQUE | UPDATE | USE | USER | VALIDATE | VERBOSE | VERSION | VIEW | WORK | WRITE | YEAR | ZONE ; And add a note:

// NOTE: $name is reserved

this is wat i understand do coorect me if wrong

elharo commented 7 months ago

Not quite what I meant. I'm not trying to change the grammar. Just warn devs that if they name a column $Foo then they might have trouble in the future if presto uses $Foo for some synthetic column or other purpose.

That is, "Yes, it is technically possible to name a column $foo but only presto dev team should do this."

This is analogous to how Java and XML treat $name. Allowed but reserved. The way the XML spec phrases this is "Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification."

The Java Language spec says, "The $ sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems."

Obviously these aren't exactly analogous, but this is the spirit I'm trying to capture.

tdcmeehan commented 7 months ago

@elharo is there any reference to this limitation in the SQL spec? If not, then I'd note columns of the form $name are just a per-connector convention, and connectors may actually specify a form any way they want. E.g. it could be @name or !name as well.

rschlussel commented 7 months ago

@elharo is there any reference to this limitation in the SQL spec? If not, then I'd note columns of the form $name are just a per-connector convention, and connectors may actually specify a form any way they want. E.g. it could be @name or !name as well.

Do we already have other connectors that use different conventions. If not, I think it would make sense to be consistent across connectors so people don't need to think about it (it's not the kind of thing where it matters tremendously what convention you use, as long as there is some convention so you don't break people's tables when you add a new kind of synthetic column). We could specify this convention just for hive (or maybe hive and iceberg, since I know iceberg uses this convention at least sometimes), but if there's no reason to limit, I think it's better not to.

tdcmeehan commented 7 months ago

@rschlussel the trouble is there's nothing to enforce any particular convention, so it would need to be added to the engine somehow. We can enumerate all open source connectors and look at what convention they use, which is most likely $name, but we don't know who has built a custom connector over the past 10+ years that specifies it differently. I think the best we can do is just mention it as a guideline in our developer docs for connectors.

rschlussel commented 7 months ago

yeah, i don't think we can make any claims about custom connectors. My goal for this guideline is for the things we control so that going forward a) users know they use our reserved-ish names at their own risk and b) developers follow these guidelines so that in the future they don't introduce reserved column names that users didn't know to avoid.

elharo commented 7 months ago

Right. I don't want to enforce anything, or add anything to the engine. Just mention it as a guideline so developers get a heads up. I don't expect there are a lot of folks naming columns with dollar signs so this isn't a huge deal.

tdcmeehan commented 7 months ago

Sounds good--documeting that, by convention, $ prefixed names are often reserved for connector makes sense.