spark.sparkContext.addPyFile() doesn't find file in ADS when using pyspark kernel

Azure Data Studio Version: 1.9.0 Commit: 78a42e1d112ae3231777722b51eaf4483ddbe55 Date: 2019-07-10

Steps to Reproduce:

Upload the XGBOOST package jar files to a big data Aris cluster. You can download them from here:
https://repo1.maven.org/maven2/ml/dmlc/xgboost4j/0.72/xgboost4j-0.72.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark/0.72/xgboost4j-spark-0.72.jar

Upload the jar files to the 'jar' directory on the cluster.

2). Upload the spark python files for xgboost to the '/user/root' cluster folder. You can download them from here:
https://github.com/dmlc/xgboost/files/2161553/sparkxgb.zip

3). Open a new 'pyspark' notebook and paste the following code into the first set of cells: from pyspark import SparkContext from pyspark.sql import SparkSession import os os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars xgboost4j-spark-0.72.jar,xgboost4j-0.72.jar pyspark-shell'

spark = SparkSession\ .builder\ .appName("PySpark XGBOOST Titanic")\ .master("local[*]")\ .getOrCreate()

4). Run this code. Note this part of the code runs fine.

5). Add another cell with the following line: spark.sparkContext.addPyFile("/user/root/sparkxgb.zip")

6). Run this cell. You will get the following error message: An error occurred while calling o106.addFile. : java.io.FileNotFoundException: File file:/user/root/sparkxgb.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)

The file exists at this location, however it is not found.

The code is taken from here: https://towardsdatascience.com/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb

microsoft / azuredatastudio

spark.sparkContext.addPyFile() doesn't find file in ADS when using pyspark kernel #6784