ibm-watson-data-lab / ibmos2spark

Facilitates Data I/O between Spark and IBM Object Storage services.
10 stars 8 forks source link

Run it from local spark instance on eclipse #49

Open santooudnur opened 6 years ago

santooudnur commented 6 years ago

Is it possible to access the IBM Cloudstorage outside apache instance in Bluemix?

Basically I am trying to use this library for access COS objects from scala program run on local apache spark. I am trying to connect to cloudstorage instance on my Bluemix account , and access temperatureUS.csv object in bucket tests from Scala code.

Test code can be found here SparkCosS.txt Always getting following error

18/01/15 19:29:50 DEBUG request: Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: null; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8cee1d0b-c4d8-4800-a75f-06ff49e76a5b), S3 Extended Request ID: null 18/01/15 19:29:50 DEBUG COSAPIClient: Not found cos://tests.myCos/temperatureUS.csv 18/01/15 19:29:50 WARN COSAPIClient: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8cee1d0b-c4d8-4800-a75f-06ff49e76a5b), S3 Extended Request ID: null Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: cos://tests.myCos/temperatureUS.csv; at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:626) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) at test.SparkCosFinalSL$.main(SparkCosSL.scala:86) at test.SparkCosFinalSL.main(SparkCosSL.scala)

However I am able to connect to service through java API. SDKGlobalConfiguration.IAM_ENDPOINT = "https://iam.bluemix.net/oidc/token";

    String bucketName = "testb5e78bd1988d453f81ec11cbfced949a";//"<bucketName>";
    String api_key = "L_-uMLV9AU-ZBWr0BE6JmiHMYFqsORXndMmfrpaqJIgG";//"<apiKey>";
    String service_instance_id = "crn:v1:bluemix:public:cloud-object-storage:global:a/647b189897a37a7ac4dbf0a3ef43fc42:866ec777-5c98-4e1c-b2bf-e5d0b1d13694::";//"<resourceInstanceId>";
    String endpoint_url = "https://s3-api.us-geo.objectstorage.softlayer.net";
    String location =  "us-geo"; //"us";

    System.out.println("Current time: " + new Timestamp(System.currentTimeMillis()).toString());
    _s3Client = createClient(api_key, service_instance_id, endpoint_url, location);

    listObjects(bucketName, _s3Client);
    listBuckets(_s3Client);

Let me know if there is anything missed by me.

However only observation I have seen when run spark from eclipse is that Hadoop library not loaded

18/01/15 19:42:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Appreciate your quick response