Closed m-mohr closed 1 year ago
Column names correspond to field identifiers in the Thrift IDL. See Identifier
here: https://github.com/apache/thrift/blob/v0.17.0/doc/specs/idl.md#identifier
So we could require some pattern like ^([A-Z]|[a-z]|_)([A-Z]|[a-z]|[0-9]|\.|_)*$
, but I'm not sure that is a good idea. There may be implementations that accept more than these characters (I know there are implementations that accept fewer). And maybe there will be some future version that accepts a wider range of identifiers.
If anything, I think a good validator would assert that the geometry column names match an existing top-level field name, but I think it might be more trouble than value to add JSON schema validation around the identifiers.
Are there any restrictions on column names in Parquet/Arrow that we could check for in the schema? Are these for example all valid?