Closed barnettben closed 2 weeks ago
@barnettben Thanks for the suggestion! I am certainly happy to add support for case-insensitive keywords. I would prefer your third option: adding a new property. I think, it is useful to keep the array of keywords as strings, as this can be useful for other purposes as well.
I have two small suggestions regarding the interface:
caseInsensitiveReservedIdentifiers: Bool
(without "supports") sounds better in this case and the array of keywords is called reservedIdentifiers
. (We also have reserved symbols.)false
) for the new property in the initialiser. In that way, old code will continue to work.There is also a complication in the implementation of this feature. If you look at https://github.com/mchakravarty/CodeEditorView/blob/31c87f788bf948f158e9e22f0e537796b0566dd9/Sources/LanguageSupport/LanguageConfiguration.swift#L508 you will see that keywords (reserved identifiers) use a singleLexeme
argument to TokenDescription
. This is important as it allows for a regular expression with much fewer capture groups and speeds up the tokeniser. Now, I think, that we can continue to use this optimisation in the special case of the lexemes only varying in case. However, this will require some change in the tokeniser:
caseInsensitiveReservedIdentifiers
option is true
to avoid causing trouble for languages where case in keywords matters.Given that this is somewhat subtle,
singleLexeme
property of TokenDescription
should get an appropriate comment and TokenTests.swift
.BTW, if you are also willing to contribute your SQL language configuration, I would gladly add it to the project as well.
Thank you for the positive and detailed response!
I will aim to have a go at this over the weekend or early next week.
hey @barnettben i'm interested in SQL support too, so glad i found this. anything i can help you with?
Hi @ericzakariasson, thanks for the offer. At the moment my efforts consist of just a list of keywords, so it's not like there's a lot there!
I'm going to focus specifically on the SQLite dialect, rather than any wider standard, so it might be that our interests don't overlap. When I get a bit of time to look closer, I will post here so that you can see any progress.
@barnettben and @ericzakariasson How about opening a new issue for SQL(ite) syntax support?
After all, the goal of this issue (namely support for case-insensitive reserved identifiers) has been met with @barnettben's PR.
BTW, if @ericzakariasson is interested in full SQL (are you?) is it maybe possible to have a common set of definitions for the overlap between the two and then two separate language configurations for SQLite and SQL, which build on that common core.
I've never worked with Swift until last Friday 😅 I don't feel comfortable enough to start working in this code
Here's the Postgres code I've written so far.
import LanguageSupport
import Foundation
import RegexBuilder
private let postgresReservedIds = [
"ALL", "ANALYSE", "ANALYZE", "AND", "ANY", "ARRAY", "AS", "ASC", "ASYMMETRIC",
"AUTHORIZATION", "BINARY", "BOTH", "CASE", "CAST", "CHECK", "COLLATE", "COLLATION",
"COLUMN", "CONCURRENTLY", "CONSTRAINT", "CREATE", "CROSS", "CURRENT_CATALOG",
"CURRENT_DATE", "CURRENT_ROLE", "CURRENT_SCHEMA", "CURRENT_TIME", "CURRENT_TIMESTAMP",
"CURRENT_USER", "DEFAULT", "DEFERRABLE", "DESC", "DISTINCT", "DO", "ELSE", "END",
"EXCEPT", "FALSE", "FETCH", "FOR", "FOREIGN", "FREEZE", "FROM", "FULL", "GRANT",
"GROUP", "HAVING", "ILIKE", "IN", "INITIALLY", "INNER", "INTERSECT", "INTO", "IS",
"ISNULL", "JOIN", "LATERAL", "LEADING", "LEFT", "LIKE", "LIMIT", "LOCALTIME",
"LOCALTIMESTAMP", "NATURAL", "NOT", "NOTNULL", "NULL", "OFFSET", "ON", "ONLY",
"OR", "ORDER", "OUTER", "OVERLAPS", "PLACING", "PRIMARY", "REFERENCES", "RETURNING",
"RIGHT", "SELECT", "SESSION_USER", "SIMILAR", "SOME", "SYMMETRIC", "TABLE", "THEN",
"TO", "TRAILING", "TRUE", "UNION", "UNIQUE", "USER", "USING", "VARIADIC", "VERBOSE",
"WHEN", "WHERE", "WINDOW", "WITH"
]
private let postgresReservedOperators = [
"+", "-", "*", "/", "%", "=", "<>", "!=", "<", ">", "<=", ">=", "||",
"<<", ">>", "&<", "&>", "<<|", "|>>", "&<|", "|&>",
"->", "->>", "#>", "#>>", "@>", "<@", "?", "?|", "?&",
"&&", "-|-", "~~", "~~*", "!~~", "!~~*", "@@@", "::", "."
]
extension LanguageConfiguration {
/// Language configuration for PostgreSQL
public static func postgres(_ languageService: LanguageService? = nil) -> LanguageConfiguration {
// numeric types
let numberRegex = /[+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?/
// identifiers
let identifierRegex = /[a-zA-Z_][a-zA-Z0-9_$]*|"[^"]+"/
// operators
let operatorRegex = /[+\-*\/<>=!|&%^~?#@:.]+/
// standard quotes and dollar quoting
let stringRegex = /'(?:[^']|'')*'|"(?:[^"]|"")*"|(?:\$[^$]*\$).*?/
return LanguageConfiguration(
name: "PostgreSQL",
supportsSquareBrackets: true,
supportsCurlyBrackets: false,
stringRegex: stringRegex,
characterRegex: nil,
numberRegex: numberRegex,
singleLineComment: "--",
nestedComment: (open: "/*", close: "*/"),
identifierRegex: identifierRegex,
operatorRegex: operatorRegex,
reservedIdentifiers: postgresReservedIds,
reservedOperators: postgresReservedOperators,
languageService: languageService
)
}
}
As suggested, I've opened #116 to track the language configuration so that this issue can be closed.
I was experimenting with adding a language configuration for SQL, and found that the reserved identifiers are case-sensitive. I would like to be able to specify them as case-insensitive.
I am happy to submit a PR for this if it's something that would be useful, but am not sure of the preferred approach.
Possible options:
.ignoresCase()
when building a regex. Not good. Other languages don't need or want this.reservedIdentifiers
from[String]
to[Regex]
. This is how the property is used intokenDictionary
, but does change the public API and also is less useful if there were other places that would want to use the array contents in the future.supportsCaseInsensitiveKeywords: Bool
to matchsupportsSquareBrackets
. Fine, but does mean increasing the number of properties.Is this something you would be open to changing and if so, would you have a preferred method?