Open gadgetman4u opened 5 years ago
I already saved the neighborhoods.geojson file into Azure Data Lake Store and placed the path to it in the dbutils.fs.mount. How do I extract the neighborhoods and trips as per the code here?
val trips = sqlContext.read .format("com.databricks.spark.csv") .option("comment", "V") .option("mode", "DROPMALFORMED") .schema(schema) .load("/mnt/nyctaxicabanalysis/trips/*") .withColumn("point", point($"pickup_longitude",$"pickup_latitude")) .cache()
val neighborhoods = sqlContext.read .format("magellan") .option("type", "geojson") .load("/mnt/nyctaxicabanalysis/neighborhoods/") .select($"polygon", $"metadata"("neighborhood").as("neighborhood")) .cache()
Thanks.
Does anybody know how I can upload the data into Azure so I can extract the neighborhoods and trips?
I found this today that may be interesting for you. I'm not the author: https://lamastex.github.io/scalable-data-science/sds/2/2/db/032_NYtaxisInMagellan.html It only works for me using the Databricks runtime with Spark 2.1.1.
I would like to run the NYC Taxicab analysis notebook with Azure Databricks but the data is in S3. How do I save the data into Azure? Would I save to Azure Data Lake Store and then mount it to Databricks?
Thanks.