partiql / partiql-lang-kotlin

PartiQL libraries and tools in Kotlin.
https://partiql.org/
Apache License 2.0
539 stars 62 forks source link

Revisit case sensitivity of identifiers #1122

Open vgapeyev opened 1 year ago

vgapeyev commented 1 year ago

In short: What is currently implemented for identifiers in partiql-lang-kotlin:

Ideally, we implement what SQL prescribes, while taking care of properly abstracting it. This will surely be backward-incompatible, but everyone's sanity can be worth it.

SQL

In SQL, there are two syntactic variants of identifiers: regular identifiers like foo and delimited/quoted identifiers like "fOO".

partiql-lang-kotlin

In contrast, in the current implementation:

[Disclaimer: the above is an attempt at a rational reconstruction of the design in place; no claim is made about its fidelity to the intent or to all intricacies of the implementation.]

Known history

Lower case was chosen as the canonical case in order to conform with PostgreSQL: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

Conclusion

While the SQL's design is not an exhibit of simplicity, the approach in p-l-k is even more complicated. While the two seem to coincide in common usage (say, only non-delimited identifiers and no malicious introduction of ambiguous binders), the differences are significant and break SQL conformance claims.

johnedquinn commented 1 year ago

Adding some background from SQL-1999, section 20.1:

The representation of an \ in the base tables and views of the Information Schema is by a character string corresponding to its \ (in the case of a \) or its \ (in the case of a \). Within this character string, any lower-case letter appearing in a \ is replaced by the equivalent upper-case letter, and any \ appearing in a \ is replaced by a \. Where an \ has multiple forms that are equal according to the rules of Subclause 8.2, ‘‘\’’, the form stored is that encountered at definition time.

The specification opts for using uppercase

RCHowell commented 1 year ago

Related, we will need to reify the case-sensitivity of binders in the AST so that when we pretty-print we will know to quote or not to quote (in the case of keywords). However, the quotes should not matter for binders as we use that literal value regardless of quotes.

Today's behavior is what one would expect, that is SELECT x as a FROM T to output << { 'a': .. } ... >> rather than << { 'A': ... } ... >>

dlurton commented 1 year ago

TBH, I'm not sure where this is coming from. The SQL-92 specification AFAICT is clear about it in section 5.2:

14) Two <delimited identifier>s are equivalent if their <delimitedidentifier body>s (with all occurrences of <quote> replaced by and all occurrences of <doublequote symbol> replaced by <doublequote>), considered as the repetition of acharacter string literal> that specifies a <character set specification> of SQL_TEXT and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>

RCHowell commented 2 months ago

@yliuuuu could you please provide an update here as to why #1519 was closed?