samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

java.lang.ClassNotFoundException: java.lang.ProcessEnvironment$Variable #44

Closed 87sanchavan closed 6 years ago

87sanchavan commented 7 years ago

Hi Sam , I m facing this issue.. I think this could be versioning issue. I am same pyspark code you have given in another issue which I had raised. have you ever faced this or you have any idea about it.

java.lang.ClassNotFoundException: java.lang.ProcessEnvironment$Variable at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:260) at com.samelamin.spark.bigquery.utils.EnvHacker$.setEnv(EnvHacker.scala:15) at com.samelamin.spark.bigquery.BigQueryClient$.setGoogleBQEnvVariable(BigQueryClient.scala:51) at com.samelamin.spark.bigquery.BigQueryClient$.getInstance(BigQueryClient.scala:34) at com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19) at com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19) at com.samelamin.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:86) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File "C:/Users/IT000493/PycharmProjects/sc-ln/pyspark-examples/hello_pyspark.py", line 30, in

File "D:\software\spark-2.2.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in call File "D:\software\spark-2.2.0-bin-hadoop2.6\python\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "D:\software\spark-2.2.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o34.bigQuerySelect. : java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List; at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:655) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createOptionsBuilderFromConfig(GoogleHadoopFileSystemBase.java:2005) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1697) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:878) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:841) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration.getTemporaryPathRoot(BigQueryConfiguration.java:283) at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:114) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.take(RDD.scala:1327) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1368) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.first(RDD.scala:1367) at com.samelamin.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

darylerwin commented 6 years ago

Same error .. just coming to post this same problem. Code works fine from Google Cloud but when run from my desktop using credentials it fails with the above.

Java.lang.ClassNotFoundException: java.lang.ProcessEnvironment$Variable
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at com.samelamin.spark.bigquery.utils.EnvHacker$.setEnv(EnvHacker.scala:15)
    at com.samelamin.spark.bigquery.BigQueryClient$.setGoogleBQEnvVariable(BigQueryClient.scala:51)
    at com.samelamin.spark.bigquery.BigQueryClient$.getInstance(BigQueryClient.scala:34)
    at com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
    at com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
    at com.samelamin.spark.bigquery.BigQuerySQLContext.setBigQueryDatasetLocation(BigQuerySQLContext.scala:65)
    at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply$mcV$sp(DfpSdSupplyDemandTest.scala:35)
    at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply(DfpSdSupplyDemandTest.scala:22)
    at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply(DfpSdSupplyDemandTest.scala:22)
    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    at org.scalatest.Transformer.apply(Transformer.scala:22)
    at org.scalatest.Transformer.apply(Transformer.scala:20)
    at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
    at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
    at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)
samelamin commented 6 years ago

Hi folks

Can you please post the code snippet you are trying to save?

It seems like perhaps the environment variable hasn't been set up right

Is this Pyspark or scala? On Fri, 22 Sep 2017 at 14:17, Daryl Erwin notifications@github.com wrote:

Same error .. just coming to post this same problem. Code works fine from Google Cloud but when run from my desktop using credentials it fails with the above.

Java.lang.ClassNotFoundException: java.lang.ProcessEnvironment$Variable at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at com.samelamin.spark.bigquery.utils.EnvHacker$.setEnv(EnvHacker.scala:15) at com.samelamin.spark.bigquery.BigQueryClient$.setGoogleBQEnvVariable(BigQueryClient.scala:51) at com.samelamin.spark.bigquery.BigQueryClient$.getInstance(BigQueryClient.scala:34) at com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19) at com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19) at com.samelamin.spark.bigquery.BigQuerySQLContext.setBigQueryDatasetLocation(BigQuerySQLContext.scala:65) at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply$mcV$sp(DfpSdSupplyDemandTest.scala:35) at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply(DfpSdSupplyDemandTest.scala:22) at com.bbmtek.external.DfpSdSupplyDemandTest$$anonfun$1.apply(DfpSdSupplyDemandTest.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/44#issuecomment-331443768, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm0znODQPFB85zTWUSiDrDQAMNBfQks5sk7NVgaJpZM4PW7t9 .

darylerwin commented 6 years ago

Scala .. must be something in my desktop config - it works on the google cloud vm. On gcp box I run .. spark-shell --packages com.github.samelamin:spark-bigquery_2.11:0.2.2,com.github.databricks:spark-avro:204864b6cf < test.scala

If I run via intellij on my desktop .. It fails on the last line ... note that I dont have the JsonKey line in the code when I run on gcp.

import com.samelamin.spark.bigquery._ spark.sqlContext.setGcpJsonKeyFile(s"$resources/BIGDATA-CREDENTIALS.json") spark.sqlContext.setBigQueryProjectId("bbm-production-bigdata") spark.sqlContext.setBigQueryGcsBucket("bbm-production-bigdata") spark.sqlContext.setBigQueryDatasetLocation("US")

samelamin commented 6 years ago

Looks like its an issue with setting the env variable, what os are you using? and whats your java version?

Perhaps install the latest jdk?

On Fri, Sep 22, 2017 at 3:34 PM, Daryl Erwin notifications@github.com wrote:

Scala .. must be something in my desktop config - it works on the google cloud vm. On gcp box I run .. spark-shell --packages com.github.samelamin:spark-bigquery_2.11:0.2.2,com. github.databricks:spark-avro:204864b6cf < test.scala

If I run via intellij on my desktop .. It fails on the last line ... note that I dont have the JsonKey line in the code when I run on gcp.

import com.samelamin.spark.bigquery._ spark.sqlContext.setGcpJsonKeyFile(s"$resources/BIGDATA-CREDENTIALS.json") spark.sqlContext.setBigQueryProjectId("bbm-production-bigdata") spark.sqlContext.setBigQueryGcsBucket("bbm-production-bigdata") spark.sqlContext.setBigQueryDatasetLocation("US")

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/44#issuecomment-331464513, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLmzRHKZIfa92omDbhcrtLmTN6d57Bks5sk8WHgaJpZM4PW7t9 .

darylerwin commented 6 years ago

Windows .. I tried setting the GOOGLE_APPLICATION_CREDENTIALS. First time it said it couldnt find the file then I renamed and it found it. Got the above error. So I removed it thinking I could just use the api call instead but still get the error.

using jdk1.8.0_144 and jre1.8.0_111

Are you able to share any intelij settings that might be relevant? I have your bigquery_2.11-0.2.2.jar along with the spark2.2.0-bin-hadoop2.7 jars

darylerwin commented 6 years ago

Is the json creds file something similar to ..

{ "type": "service_account", "project_id": "xxxx", "private_key_id": "xxxxxxa", "private_key": "-----BEGIN PRIVATE KEY-----\n blah blah blah

samelamin commented 6 years ago

Hi Daryl,

that is the json file yes.

It is certainly a problem with env variables and I have not tested the library against windows so its hard for me to debug it for you without actually being there

It shouldn't matter about intellij because the issue seems to be separate to the IDE

I am assuming you are using a fat/uber jar, if you spark submit via the console/terminal do you get the same error?

On Fri, Sep 22, 2017 at 4:31 PM, Daryl Erwin notifications@github.com wrote:

Is the json creds file something similar to ..

{ "type": "service_account", "project_id": "xxxx", "private_key_id": "xxxxxxa", "private_key": "-----BEGIN PRIVATE KEY-----\n blah blah blah

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/44#issuecomment-331480666, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm51mFbWXNsMct69PvnlIVMIRpndIks5sk9LkgaJpZM4PW7t9 .

darylerwin commented 6 years ago

Did you say you didnt use an environment variable? or you do? Did you do anything special? I did in windows environment VARIABLE="/path/to/.json"

samelamin commented 6 years ago

Well ideally you should not need to this is what environment hacker does. But for some reason the jre on your environment dies when it tries to set the environment variable

I'd investigate why this line is failing on your os

Java.lang.ClassNotFoundException: java.lang.ProcessEnvironment

I believe it has to do with the jre On Fri, 22 Sep 2017 at 20:22, Daryl Erwin notifications@github.com wrote:

Did you say you didnt use an environment variable? or you do? Did you do anything special? I did in windows environment VARIABLE="/path/to/.json"

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/44#issuecomment-331538158, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm3AQfc8Hl_A-QI51LnQaAXmmd2Z4ks5slAkTgaJpZM4PW7t9 .

samelamin commented 6 years ago

I am closing this issue as I have not heard anything back and I cannot replicate it

darylerwin commented 6 years ago

A member of our team here found the issue . it is the hack for read/writing environment variables that doesnt work on windows. He commented out the code as it wasn't apparently required. setting a hadoop environment variable?

com.samelamin.spark.bigquery.utils.EnvHacker

samelamin commented 6 years ago

ah yes, ironically i added that to make it simpler to use to connector. It uses the Google Hadoop connector to move data in and out of Google storage and that needs the Google credentials file in a env variable

Thanks for raising it, if you send a pr ill review and merge. I would do it myself but i do not have a windows machine to test with

On Thu, Oct 5, 2017 at 4:35 PM, Daryl Erwin notifications@github.com wrote:

A member of our team here found the issue . it is the hack for read/writing environment variables that doesnt work on windows. He commented out the code as it wasn't apparently required. setting a hadoop environment variable?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/issues/44#issuecomment-334503704, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm6fn5eM4d1gDLRvyPL-TxPDSHbwQks5spPdKgaJpZM4PW7t9 .