prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16k stars 5.36k forks source link

have not support for the view of hive? #7338

Open lichengfei1988 opened 7 years ago

lichengfei1988 commented 7 years ago

have we ever consider adding support for the view of hive or have this plan but not start?

han-wang commented 7 years ago

I also want to know it

electrum commented 7 years ago

Hive views using HQL syntax and semantics, so supporting them would require a way to translate HQL to Presto SQL, including all functions, null handling, etc.

sajjoseph commented 7 years ago

While I agree that the full featured Hive views could take time, can we still support the easier ones - if the view HQL is simple enough, can we try to support it? In our organization, there are 100s of hive views already created and so it will be great if presto can give the Hive view a try before failing the request.

electrum commented 7 years ago

It depends on what "simple" means. Even an operation like a + b has different semantics in Presto and Hive due to overflow behavior. Do you have some examples?

haitaoyao commented 7 years ago

I think simple queries are like create view test_view as select * from target_table where col1 = XXX and col2=b

amoghmargoor commented 7 years ago

This is the PR I created for supporting basic Hive views: https://github.com/prestodb/presto/pull/9031

This change would add support for those Hive Views which is described using HiveQL compatible (or parseable) with Presto. Following will be the behaviour of compatible and incompatible Hive Views which has been captured in UTs in PR above:

  1. show tables will show both compatible and Incompatible Views.
  2. Executing against compatible view will succeed provided there are right authorization, where as Executing against incompatible view will fail with Table not Found exception.
  3. Authorization is supported.
  4. Queries on Information schema will not list columns for incompatible view i.e., queries like show columns for will give empty results.

If there are more people interested then this can be pushed and I am ready to do necessary changes for the same. Thanks

purejade commented 6 years ago

@amoghmargoor hi, i am interested the pr #9031. are you still doing this work? are there some questions to use this patch? thank you.

sajjoseph commented 6 years ago

During the recent presto summit, one of the presentations had information about using apache calcite to parse the hive view and generating presto queries.

On Wed, Jul 18, 2018 at 9:55 PM, purejade notifications@github.com wrote:

@amoghmargoor https://github.com/amoghmargoor hi, i am interested the pr #9031 https://github.com/prestodb/presto/pull/9031. are you still doing this work? are there some questions to use this patch? thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/7338#issuecomment-406146741, or mute the thread https://github.com/notifications/unsubscribe-auth/AQhEuFKVzvs9sfWKsc8WE5t63nwToH_aks5uIAMcgaJpZM4L71F6 .

-- Sajumon Joseph 303-378-0593 sajjoseph@gmail.com

dain commented 6 years ago

The presentation was from LinkedIn and it only works, because they have an internal UDF library that is ported to both Hive and Presto. Also, they don't try to deal with semantic differences like Hive not failing on arithmetic overflow. As @electrum mentioned, doing a correct and safe conversion from Hive to Presto for anything other then super simple queries is extremely difficult.

thoralf-gutierrez commented 6 years ago

@dain do you consider @amoghmargoor's approach a no-go or could you see it as a practical incremental improvement (maybe with a feature toggle)? It would definitely be useful for everybody using views to hide table versioning (i.e. where your view is a simple UNION of versioned tables with some small changes for some fields).

How do you guys manage this at Facebook? Do you duplicate all your views with a Hive view (for Hive/Spark) and a Presto view (for Presto)?

dain commented 6 years ago

For Presto to support this, there must be an exact guaranteed mapping of the query (we have a rule about always correct results). I'm not sure about the general language differences, but as we mentioned functions and operators are a big problem. Someone would need to write extensive tests for any function or operator that would be translated between the systems. Of specific concern is Hive's has interesting implicit coercion semantics and null handling, so stuff as simple as equals will be challenging.

I think it might be possible to support SELECT columns FROM table with no where clause. Adding I would expect that adding support for UNION ALL between tables wouldn't be too difficult (assuming no coercion is needed).

At FB, we don't share views between Hive and Presto. We talked about implementing the super simple no WHERE, no projections, above, but decided it wasn't impactful enough to justify the work.

tooptoop4 commented 5 years ago

'select * from hive.information_schema.columns' gives error about views not supported if you do have hive views

thaibui commented 4 years ago

This is still an issue 2 years later so I would like to reiterate the important of this compatibility.

Providing compatibility at the basic level should really help users to port their existing Hive views to Presto views for one main reason:

This is a starting point. Perhaps most of the views from Hive would work and some don't, those that work could give a different results (with regards to nullness, etc.) however, this feature allows the users to see the differences.

That should give us a gateway to ensure more interoperability amongst systems. Perhaps, there's a way in Hive to write views such that they are 100% compatible with Presto and vice versa.

Enabling this via a feature flag would be a harm-free addition so I would love to have someone from the Presto team give more thought into this.

Thanks, Thai

nikitajaiswal9299 commented 4 years ago

@thaibui can you please explain more about that feature flag you are talking about ?? Please mention that property too.