harsha2010 / magellan

Geo Spatial Data Analytics on Spark
Apache License 2.0
534 stars 149 forks source link

From spark dataframe to Polygon ? #215

Closed laurikoobas closed 6 years ago

laurikoobas commented 6 years ago

How is that supposed to work? For Points there's the point($"x", $"y") way, but how to do the same thing with Polygons?

I have a spark dataframe that has an array of Points in one column and I'd like to turn that into a dataframe that has Polygons in a column that are based on those arrays of Points.

harsha2010 commented 6 years ago

@laurikoobas you can create a user defined function (UDF) to do this... simply invoke Polygon(Array(0), points) where points is the array of points representing the polygon. We expect this array to be a loop, i.e. the starting and ending point in this array should be the same... the UDF will look something like val toPolygon = udf{(points: Array[Point]) => Polygon(Array(0), points)}

laurikoobas commented 6 years ago

Right, that makes sense. And I apologize for keeping posting in this issue, but I don't know of a better avenue for asking for help on this.

I have this:

scala> p.printSchema
root
 |-- area_id: string (nullable = true)
 |-- line_index: integer (nullable = false)
 |-- points: array (nullable = true)
 |    |-- element: point (containsNull = true)

And do this:

val toPolygon = udf{(points: Array[Point]) => Polygon(Array(0), points)}
var a = p.select($"area_id", $"line_index", toPolygon($"points"))
a.show

And the result was this (after a few pages of stack trace): scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lmagellan.Point

laurikoobas commented 6 years ago

I figured it out after a while. The UDF should be this: val toPolygon = udf{(points: Seq[Point]) => Polygon(Array(0), points.toArray)}