Open vgapeyev opened 1 year ago
Adding some background from SQL-1999, section 20.1:
The representation of an \
in the base tables and views of the Information Schema is by a character string corresponding to its \ (in the case of a \ ) or its \ (in the case of a \ ). Within this character string, any lower-case letter appearing in a \ is replaced by the equivalent upper-case letter, and any \ appearing in a \ is replaced by a \ . Where an \ has multiple forms that are equal according to the rules of Subclause 8.2, ‘‘\ ’’, the form stored is that encountered at definition time.
The specification opts for using uppercase
Related, we will need to reify the case-sensitivity of binders in the AST so that when we pretty-print we will know to quote or not to quote (in the case of keywords). However, the quotes should not matter for binders as we use that literal value regardless of quotes.
Today's behavior is what one would expect, that is SELECT x as a FROM T
to output << { 'a': .. } ... >>
rather than << { 'A': ... } ... >>
TBH, I'm not sure where this is coming from. The SQL-92 specification AFAICT is clear about it in section 5.2:
14) Two
<delimited identifier>
s are equivalent if their<delimitedidentifier body>
s (with all occurrences of<quote>
replaced byand all occurrences of<doublequote symbol>
replaced by<doublequote>
), considered as the repetition of acharacter string literal> that specifies a<character set specification>
ofSQL_TEXT
and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>
@yliuuuu could you please provide an update here as to why #1519 was closed?
In short: What is currently implemented for identifiers in partiql-lang-kotlin:
Ideally, we implement what SQL prescribes, while taking care of properly abstracting it. This will surely be backward-incompatible, but everyone's sanity can be worth it.
SQL
In SQL, there are two syntactic variants of identifiers: regular identifiers like
foo
and delimited/quoted identifiers like"fOO"
.foo
,Foo
,fOo
,FOO
and, notably,"FOO"
are all the same identifier, canonically beingFOO
. This identifier is distinct from"foo"
,"Foo"
,"fOo"
, all of which are also distinct from each other.Importantly, it is ok to name an object
"FOO"
and then refer to it asFOO
orfoo
and, conversely, name an objectgOO
and then refer to it as"GOO"
.partiql-lang-kotlin
In contrast, in the current implementation:
Foo
matching bindersFOO
,Foo
,foo
) -- apparently, identifiers that are equivalent in case-insensitive sense can bind to different entities, without shadowing.[Disclaimer: the above is an attempt at a rational reconstruction of the design in place; no claim is made about its fidelity to the intent or to all intricacies of the implementation.]
Known history
Lower case was chosen as the canonical case in order to conform with PostgreSQL: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
Conclusion
While the SQL's design is not an exhibit of simplicity, the approach in p-l-k is even more complicated. While the two seem to coincide in common usage (say, only non-delimited identifiers and no malicious introduction of ambiguous binders), the differences are significant and break SQL conformance claims.