svenkreiss / pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.
https://pysparkling.readthedocs.io
Other
261 stars 45 forks source link

AttributeError: 'StructType' object has no attribute 'treeString' #140

Open svaningelgem opened 3 years ago

svaningelgem commented 3 years ago

The exception in subject is thrown by the following code:

from datetime import date
from pysparkling.sql.session import SparkSession
from pysparkling.sql.functions import collect_set

spark = SparkSession.Builder().getOrCreate()

dataset_usage = [
    ('steven', 'UUID1', date(2019, 7, 22)),
]
dataset_usage_schema = 'id: string, datauid: string, access_date: date'

df = spark.sparkContext.parallelize(dataset_usage).toDF(dataset_usage_schema)
df.printSchema()
tools4origins commented 3 years ago

There is currently no good first issue on this repo labels but this one could be one as there is no project-specific complexity in implementing the missing method, with some attention given to nested fields

svaningelgem commented 3 years ago

Ok, I'll pick that up once the PRs are through :-) (stabler code base)