tomasfabian / ksqlDB.RestApi.Client-DotNet

ksqlDb.RestApi.Client is a C# LINQ-enabled client API for issuing and consuming ksqlDB push and pull queries and executing statements.
MIT License
94 stars 25 forks source link

Column identifiers are not masked when they are reserved keywords #57

Closed mrt181 closed 7 months ago

mrt181 commented 7 months ago

When I try to create or replace a table that contains a reserved keyword in a column identifier that identifier is not masked.

ksql> CREATE OR REPLACE TABLE Test (
>  id BIGINT PRIMARY KEY,
>  end TIMESTAMP
>)  WITH ( KAFKA_TOPIC='Test', VALUE_FORMAT='Json', PARTITIONS='1', REPLICAS='1' );
line 3:3: Syntax Error
"end" is a reserved keyword and it can't be used as an identifier. You can use it as an identifier by escaping it as 'end'
Statement:   end TIMESTAMP
Caused by: line 3:3: Syntax error at line 3:3

ksql> CREATE OR REPLACE TABLE Test (
>  id BIGINT PRIMARY KEY,
>  `end` TIMESTAMP
>)  WITH ( KAFKA_TOPIC='Test', VALUE_FORMAT='Json', PARTITIONS='1', REPLICAS='1' );

 Message
---------------
 Table created
---------------

On the second attempt I use backticks to mask the end identifier.

The rest api client does not mask identifiers that contain keywords.

One option might be to always mask the column names like this:

diff --git a/ksqlDb.RestApi.Client/KSql/RestApi/Generators/TypeGenerator.cs b/ksqlDb.RestApi.Client/KSql/RestApi/Generators/TypeGenerator.cs
index a8adf85..1fc70e8 100644
--- a/ksqlDb.RestApi.Client/KSql/RestApi/Generators/TypeGenerator.cs
+++ b/ksqlDb.RestApi.Client/KSql/RestApi/Generators/TypeGenerator.cs
@@ -37,7 +37,7 @@ internal class TypeGenerator : CreateEntityStatement

       var ksqlType = CreateEntity.KSqlTypeTranslator(type);

-      string columnDefinition = $"{memberInfo.Name} {ksqlType}{CreateEntity.ExploreAttributes(memberInfo, type)}";
+      string columnDefinition = $"`{memberInfo.Name}` {ksqlType}{CreateEntity.ExploreAttributes(memberInfo, type)}";

       ksqlProperties.Add(columnDefinition);
     }
diff --git a/ksqlDb.RestApi.Client/KSql/RestApi/Statements/CreateEntity.cs b/ksqlDb.RestApi.Client/KSql/RestApi/Statements/CreateEntity.cs
index 77512c4..fac945d 100644
--- a/ksqlDb.RestApi.Client/KSql/RestApi/Statements/CreateEntity.cs
+++ b/ksqlDb.RestApi.Client/KSql/RestApi/Statements/CreateEntity.cs
@@ -13,7 +13,7 @@ internal sealed class CreateEntity : CreateEntityStatement
   internal static string KSqlTypeTranslator(Type type)
   {
     var ksqlType = string.Empty;
-
+
     if (type == typeof(byte[]))
       ksqlType = KSqlTypes.Bytes;
     else if (type.IsArray)
@@ -156,7 +156,7 @@ internal sealed class CreateEntity : CreateEntityStatement

       var columnName = memberInfo.GetMemberName();

-      string columnDefinition = $"\t{columnName} {ksqlType}{ExploreAttributes(memberInfo, type)}";
+      string columnDefinition = $"\t`{columnName}` {ksqlType}{ExploreAttributes(memberInfo, type)}";

       columnDefinition += TryAttachKey(statementContext.KSqlEntityType, memberInfo);

@@ -176,7 +176,7 @@ internal sealed class CreateEntity : CreateEntityStatement

       var ksqlType = KSqlTypeTranslator(memberType);

-      string columnDefinition = $"{memberInfo.Name} {ksqlType}{ExploreAttributes(memberInfo, memberType)}";
+      string columnDefinition = $"`{memberInfo.Name}` {ksqlType}{ExploreAttributes(memberInfo, memberType)}";

       ksqlProperties.Add(columnDefinition);
     }

Another option would be to introduce a check that escapes when necessary: keywords.patch

tomasfabian commented 7 months ago

Hello @mrt181, wouldn't the former approach constitute a breaking change for existing users? It alters the casing.
We could also consider introducing an additional configuration named EntityCreationMetadata.UseBacktics, or we could implement an attribute that would decorate classes and properties:

[UseBackticks]
public class Foo
{
  [UseBackticks]
  public string Bar { get; set; }
}

I think the attribute-based approach could also be utilized for data retrieval. What are your thoughts on this?

mrt181 commented 7 months ago

Yes just backticking everything isn't good enough.

Maybe the second approach from the patch is more viable as it only escapes when the column identifier is a keyword and the statement would fail without the escaped identifiers anyway.

The attribute approach is explicit but also exposes the user to the ksqlDB internals.

This can be annoying when the fields are not under the control of the user, for example they are provided via code Gen from a given schema or they come from a shared library for data exchange.

tomasfabian commented 7 months ago

Alright, we can include backticks only when the column identifier matches a keyword. However, in such cases, it's crucial to keep the list of existing keywords you've already provided updated whenever new ones are introduced. This method of escaping should also be implemented for insert and select statements.

Would you prefer to draft a pull request for this, or shall I take care of it?

mrt181 commented 7 months ago

I do it tomorrow

tomasfabian commented 7 months ago

Awesome, I am looking forward to it! Thank you in advance.

mrt181 commented 7 months ago

@tomasfabian I have created the necessary column identifier check by utilizing the SqlBase.g4 grammer from the ksql project (that project does the same).

I would like to make this an opt-in feature, that is keep the current functionality and provide a way to escape if desired/necessary.

This is my current idea.

// in public record EntityCreationMetadata
  public IdentifierEscaping IdentifierFormat { get; set; } = IdentifierEscaping.None;
  public enum IdentifierEscaping
  {
    None = 0,
    Keywords = 1,
    Always = 2,
  }

metadata is send to the StatementGenerator where the column identifieres are created.

The first option keeps the current behavior. The second would escape if an identifer is a keyword. The third option would always escape (see: https://docs.ksqldb.io/en/latest/reference/sql/syntax/lexical-structure/#identifiers)

What's your opinion on that?

tomasfabian commented 7 months ago

Sure why not? The same enum could be also applied to the KSqlDBContextOptions:

You must select from it by backticking the stream name and column name and using the original casing:

SELECT `@MY-identifier-stream-column!` FROM `s1` EMIT CHANGES;

Is there an existing .NET package for the SqlBase.g4 grammar? I'm not sure if it is possible to take it from ksql project as it is due to copyrights etc.

mrt181 commented 7 months ago

The confluenct license forbids competition as a SaaS. This library doesn't do that so should be good to go.

tomasfabian commented 7 months ago

I released a new version: dotnet add package ksqlDb.RestApi.Client --version 3.6.0

Thank you, @mrt181, on behalf of the entire community for your contribution to this nice enhancement! :)