Closed sjrusso8 closed 1 month ago
Implement data types based on the Spark Data Types
CastToDataType
DataType
&str
col("name").cast(...)
schema
DataFrameReader
ToSchema
StructType
reader
ToProtoType
From<T> for spark::DataType
fromJson
StructField
schema()
As a schema option
let schema = StructType::new(vec![ StructField { name: "name", data_type: DataType::String, nullable: false, metadata: None, }, StructField { name: "age", data_type: DataType::Short, nullable: true, metadata: None, }, ]); let df = spark .clone() .read() .format("json") .schema(schema) .load(path)?;
As a column cast
let df = df .select([ F::col("name").cast(DataType::String).alias("name_str"), F::col("age").cast(DataType::Integer).alias("age_int"), ]);
Description
Implement data types based on the Spark Data Types
CastToDataType
to allow for a user to leverageDataType
or&str
when usingcol("name").cast(...)
schema
method on theDataFrameReader
to specify the read schemaToSchema
to allow users to specific a read schema using aStructType
schema or&str
representing a DDL string.reader
example to show a type cast with aDataType
ToProtoType
and implementedFrom<T> for spark::DataType
instead. It's more idiomaticNot Implemented
fromJson
method thatStructType
andStructField
have on the current API.schema()
still returns the types from the protobuf. Will implement this change later.Usage
As a schema option
As a column cast