Closed kokosing closed 7 years ago
I hit this as well when working on tpch queries. I'd like to test a property of presto for all of those queries and I'm unable without fixing this. I'll be glad to provide a fix. @martint please let me know if there are any reasons not to add the prefixes and/or any other things that need consideration.
It's nice to have names without the prefixes for convenience (when writing queries by hand, etc), but it also makes sense for them to be as defined by the spec. Maybe, we can tag each field with it's original name and the user-friendly name. In Presto, we could add an option in the connector to switch between the two modes (e.g. strict vs non-strict).
@dain, any thoughts?
@martint To do that we'd have to define each column in tpch generator twice, right?
Personally, I think having the names consistent with the tests and the spec pays off more than having the - I admit - nice, noise-free names. I imagine that after writing a query to test sth ad-hoc devs might be discouraged to reuse it as a test because of the needed prefix amendments.
Not necessarily. We could do it in one of two ways:
Extend the enum class to take the original name and the user-friendly name:
NATION_KEY("n_nationkey", "nationkey", TpchColumnTypes.IDENTIFIER)
You can add a constant prefix to each TpchColumn
implementation and then have a getter with the unprefixed name and the prefixed name.
As for how to handle this in Presto, you could add hidden columns to alias the names (BTW I'm fine with a strict mode in the connector proposal)
I'd avoid using hidden columns. They can mess up a bunch of things like physical properties, describe, etc.
TPC-H specification uses prefix for each column name. For example nation columns are named:
airlift tpch defines nation columns as:
See lacking
n_
in column names.This causes that TPC-H queries cannot be simply generated and then executed in Presto, but require all the column names to be modified.
Specification file: http://cs.fit.edu/~pbernhar/teaching/databases/tpch.pdf