Closed mrshanth closed 9 years ago
It looks like spark cannot zip the two RDDs. Did You try train_text.zip(train_y)
without creating a DictRDD? This is exactly what splearn does under the hood - probably You will get the same exception.
Also DictRDD accepts RDD of tuples, please try the following:
train_text_y = train_rdd.map(
lambda (x, y): (y.split("~")[0], int(y.split("~")[1])))
Z_train = DictRDD(train_text_y, columns=('X', 'y'), bsize=50)
Thanks. We exactly did the same thing after posting and it worked.
I am trying to create a DictRdd as follows:
But, I get the follwing error, when I do:
I tried the following as well, but with no sucess: