quintoandar / hive-metastore-client

A client for connecting and running DDLs on hive metastore.
Apache License 2.0
52 stars 15 forks source link

Method add_partitions doesn't respect storage descriptor of Partition #70

Open mmatrich opened 2 years ago

mmatrich commented 2 years ago

I want to add a hive partition to the standalone metastore using Python's HiveMetastoreClient with a custom path. So, in other words, I want to reproduce hive command

alter table table_name add partition(dt='2022051705') location '2022/05/17/05'; I use the following code but it creates partition with default path 'bucket_name/table_name/dt=2022051704' (it creates new folder) instead of 'bucket_name/table_name/2022/05/17/04' where files are stored

from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import (
    StorageDescriptorBuilder,
    SerDeInfoBuilder,
    PartitionBuilder
)

HIVE_HOST = "xx.xx.xx.xx"
HIVE_PORT = 9083
DATABASE_NAME = 'default'
TABLE_NAME = 'table_name'

columns = [columns_list]

serde_info = SerDeInfoBuilder(
    serialization_lib="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
).build()

partition_storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
    location="/2022/05/17/04",
    input_format="org.apache.hadoop.mapred.TextInputFormat",
    output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
    serde_info=serde_info,
).build()

partition_list = [
    PartitionBuilder(
        values=["2022051704"], db_name=DATABASE_NAME, table_name=TABLE_NAME,
        sd=partition_storage_descriptor
    ).build()
]

with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_client:
    hive_client.add_partitions_if_not_exists(DATABASE_NAME, TABLE_NAME, partition_list)

Additional question. Why is it required to specify columns list in StorageDescriptorBuilder although columns had been determined when the table was created?