prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.07k stars 5.38k forks source link

[docs] document workaround for CSV files with Hive connector #23973

Open steveburnett opened 2 weeks ago

steveburnett commented 2 weeks ago

Create table int type column with CSV format in hive catalog returns the following error

presto> create table  hive_data.hive_schema.intcsv ( type int ) with ( format = 'CSV' ) ;
Query failed: Hive CSV storage format only supports VARCHAR (unbounded). Unsupported columns: type integer

From @imjalpreet:

The limitation is due to the serde that Hive/Presto use for CSV, which is OpenCSVSerde. This serde deserializes columns from a CSV file into strings only. Therefore, when creating a CSV table in Presto, the columns can only have an unbounded varchar datatype.

Meanwhile, Trino describes it uses OpenCSVSerde

Trino / Hive : https://trino.io/docs/current/connector/hive.html

CSV - using org.apache.hadoop.hive.serde2.OpenCSVSerde.

Expected Behavior or Use Case

Add documentation of the limitation and the workaround to the Hive Connector doc.

Workaround

Presto Component, Service, or Connector

Documentation.

Possible Implementation

Context

Help users understand how to use CSV files with the Hive Connector, due to the limitation of the serde that Hive/Presto uses for CSV.

steveburnett commented 2 weeks ago

@imjalpreet, please review, comment, and edit this issue to correct any errors that I made writing it.

imjalpreet commented 2 weeks ago

@steveburnett The description looks good.

AsfarHorani commented 1 week ago

Hi @steveburnett I am interested to work on it

steveburnett commented 1 week ago

Hi @steveburnett I am interested to work on it

Sure, thanks!