Feature: Implicitly generate null value column when using `COPY`

API

Other

Description

I have this scenario where I have a CSV/Parquet file with just two columns:

product   price
Laptop 1100.0
Mouse 150.0
Headphones 250.0

In my DDL, I want to add a nullable column historical_price as follows:

CREATE NODE TABLE IF NOT EXISTS Product(name STRING, price DOUBLE, historical_sales INT32, PRIMARY KEY (name));

I want to initialize the node table with all values in the historical_sales columns as nulls, and add them via MERGE at a later time when they become available.

Issue

Because my input file has just 2 columns, and my DDL specifies 3 columns, I cannot use COPY Product FROM 'product.parquet' directly.

I instead have to do this:

COPY Product(name, price) FROM 'data/product.parquet'

Feature request

In the COPY pipeline, because we already know the number of columns in the input prior to importing the data, can we implicitly infer the column names so that we can use the much simpler DDL command below? It would reduce the mental burden on the user, as it's expected that the columns that are absent in the input file would have to be filled with nulls.

COPY Product FROM 'data/product.parquet'

@ray6080 I think this seems like a reasonable feature, but if you think it's infeasible, feel free to close.

kuzudb / kuzu