risinglightdb / risinglight

An educational OLAP database system.
Apache License 2.0
1.61k stars 214 forks source link

binder: encapsulate lower-case conversion for identifiers #285

Open a9QrX3Lu opened 2 years ago

a9QrX3Lu commented 2 years ago

According to this comment, we should encapsulate the lower-case conversion for better extensibility. However, we can not do this by modifying the Catalog, because we have other states owning the identifiers, e.g., every field in BindContext.

I think the best way to do this is to visit the AST generated by sqlparser, and convert all the names into lower-case.

wangrunji0408 commented 2 years ago

Just convert the input string before parsing🙈

skyzh commented 2 years ago

Just convert the input string before parsing🙈

But this won’t retain users’ preference when creating tables. For example, they might create column V1, but will get v1 instead.

a9QrX3Lu commented 2 years ago

Just convert the input string before parsing🙈

But this won’t retain users’ preference when creating tables. For example, they might create column V1, but will get v1 instead.

This is exactly what Postgres does.🙈

a9QrX3Lu commented 2 years ago

According to this comment, we should not convert input string into lower case. So, converting all identifiers in the AST seems to be the acceptable choice.

yyin-dev commented 2 years ago

Conceptually, traversing the AST sounds easy and intuitive. But since the AST is implemented as enum's in sqlparser, how do you traverse enums? Manual matches?

skyzh commented 2 years ago

Yes, that's true 🤣 btw I just realized that Postgres parser will convert everything to lowercase, so we might follow its convention and just convert everything to lowercase.

yyin-dev commented 2 years ago

so we might follow its convention and just convert everything to lowercase.

Do you mean after converting names in the AST? Converting the query string into lowercase cannot handle the following correctly:

image

skyzh commented 2 years ago

Oh, you're right. I mean all identifiers (like table name, column name) are lowercase. Probably we only need to make all table / column case-insensitive in binder.