substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.2k stars 156 forks source link

Clarify name matching for functions, datatypes, and options #512

Open gforsyth opened 1 year ago

gforsyth commented 1 year ago

504 led to a discussion in one of the fortnightly calls about how to perform matching:

Snippet from the agenda notes from the Substrait fortnightly meeting on May 24th, 2023:

[Jacques] Let’s start with “does anyone support or want ADD to match add?”
<silence>
[Jacques] Ok, how about we just stick with “plain binary matching?”
[Gil] That seems the best approach, given we aren’t experts in languages
[Gil] We might also want to update the rules for option matching so we can have consistent matching everywhere.
[Jacques] That sounds reasonable, Victor, do you want to put together a PR?
[Victor] Sounds reasonable
[David] Do we want to recommend (not require) that everything be lowercase?
[Jacques] I think that’s fine
[Victor] What about also recommending snake_case?
[Jacques] Maybe lets move this to the ticket now

the general consensus was that we should preserve case in matching function names, e.g. ADD ~= add.

We also (relatively) recently codified that option names perform case insensitive matching. It was also discussed in the meeting that having inconsistent matching rules is not great for anyone.

Does anyone have any feelings about whether datatypes should also be matched exactly?

Should i8 be equivalent to I8? Should we require them to be lowercase?

Items to do for this:

mbrobbel commented 1 year ago

I think we should try to avoid footguns. Given that we don't define a character encoding, case conversions are not well-defined.

Should i8 be equivalent to I8?

No. It would be confusing to me if they were.

Should we require them to be lowercase?

No. I would be confused by a restriction here, if the matching is defined to be exact ("plain binary matching").