tarantool / tarantool

Get your data in RAM. Get compute close to data. Enjoy the performance.
https://www.tarantool.io
Other
3.36k stars 378 forks source link

Expose identifier.c into Lua (identifier or printable symbol class) #3405

Open Totktonada opened 6 years ago

Totktonada commented 6 years ago

Proposed to expose it via the new utf8 module.

There are two variants how to do so: add isident to check just one symbol (to provide consistent is* API) or add a function to check an entire string. Both are okay for us.

We need to forbid some symbols (like period) in our identifiers, so there are two way to handle that: add forbidden symbols parameter for the identifier_check function (or likely add separate function) or perform such check outside in Lua using utf8.next (in the case no extra changes are needed in the scope of this issue).

There is concern (@Khatskevich) that we should expose identifier symbol class from Tarantool and should not link it with avro-schema identifiers. We can expose printable characters class instead (it is just terminology question). We should make decision whether we want to support the 'valid identifier' term for use in tarantool applications / modules.

kyukhin commented 6 years ago

I'd avoid addition of any Tarantool specifics to utf8 module. Let's expose more classes instead. @Totktonada @Khatskevich , could you pls specify, which classes of ICU symbols should we expose additionally to make you happy?

Totktonada commented 6 years ago

These ones: https://github.com/tarantool/tarantool/blob/0fd1f537b9989ee1ecf330ebbc5e70d9a4e4d367/src/box/identifier.c#L60 and 0xfffd.

Totktonada commented 5 years ago

@kostja initially asked for the feature in avro-schema, but it seems there are no much need in this now. So I'll unassign myself and Roman.