microsoft / SynapseVSCode

this is the repo of the Synapse VS Code extension for Microsoft Fabric
MIT License
29 stars 5 forks source link

Cant read using spark.sql("query") #20

Open jhomiak opened 1 year ago

jhomiak commented 1 year ago

I'm new to running all of this, but I can't read data to a dataframe with spark.sql("query"). It does work when I run spark.read.format("delta").load("abfss://path/Tables/TableName").

Since I'm new, lmk what I can provide to help with this error. The output in the notebook is too long, and gets truncated even with scrolling enabled.

Py4JJavaError                             Traceback (most recent call last)
[c:\Users\alias\repos\0c996569-86a5-424d-bdb9-5f82a39983cc\SynapseNotebook\b106a477-de0d-4602-8177-15edaf98b762\Test\Test.ipynb](file:///C:/Users/alias/repos/0c996569-86a5-424d-bdb9-5f82a39983cc/SynapseNotebook/b106a477-de0d-4602-8177-15edaf98b762/Test/Test.ipynb) Cell 1 line 4
----> [4](vscode-notebook-cell:/c%3A/Users/alias/repos/0c996569-86a5-424d-bdb9-5f82a39983cc/SynapseNotebook/b106a477-de0d-4602-8177-15edaf98b762/Test/Test.ipynb#W0sZmlsZQ%3D%3D?line=3) df = spark.sql("SELECT * FROM myLH.MyTable LIMIT 10")
      [6](vscode-notebook-cell:/c%3A/Users/alias/repos/0c996569-86a5-424d-bdb9-5f82a39983cc/SynapseNotebook/b106a477-de0d-4602-8177-15edaf98b762/Test/Test.ipynb#W0sZmlsZQ%3D%3D?line=5) display(df.limit(10))

File [c:\ProgramData\miniconda3\envs\synapse-spark-kernel\lib\site-packages\pyspark\sql\session.py:1034](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1034), in SparkSession.sql(self, sqlQuery, **kwargs)
   [1032](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1032)     sqlQuery = formatter.format(sqlQuery, **kwargs)
   [1033](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1033) try:
-> [1034](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1034)     return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   [1035](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1035) finally:
   [1036](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/session.py:1036)     if len(kwargs) > 0:

File [c:\ProgramData\miniconda3\envs\synapse-spark-kernel\lib\site-packages\py4j\java_gateway.py:1321](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1321), in JavaMember.__call__(self, *args)
   [1315](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1315) command = proto.CALL_COMMAND_NAME +\
   [1316](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1316)     self.command_header +\
   [1317](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1317)     args_command +\
   [1318](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1318)     proto.END_COMMAND_PART
   [1320](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1320) answer = self.gateway_client.send_command(command)
-> [1321](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1321) return_value = get_return_value(
   [1322](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1322)     answer, self.gateway_client, self.target_id, self.name)
   [1324](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1324) for temp_arg in temp_args:
   [1325](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/java_gateway.py:1325)     temp_arg._detach()

File [c:\ProgramData\miniconda3\envs\synapse-spark-kernel\lib\site-packages\pyspark\sql\utils.py:190](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:190), in capture_sql_exception.<locals>.deco(*a, **kw)
    [188](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:188) def deco(*a: Any, **kw: Any) -> Any:
    [189](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:189)     try:
--> [190](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:190)         return f(*a, **kw)
    [191](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:191)     except Py4JJavaError as e:
    [192](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/pyspark/sql/utils.py:192)         converted = convert_exception(e.java_exception)

File [c:\ProgramData\miniconda3\envs\synapse-spark-kernel\lib\site-packages\py4j\protocol.py:326](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:326), in get_return_value(answer, gateway_client, target_id, name)
    [324](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:324) value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    [325](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:325) if answer[1] == REFERENCE_TYPE:
--> [326](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:326)     raise Py4JJavaError(
    [327](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:327)         "An error occurred while calling {0}{1}{2}.\n".
    [328](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:328)         format(target_id, ".", name), value)
    [329](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:329) else:
    [330](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:330)     raise Py4JError(
    [331](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:331)         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    [332](file:///C:/ProgramData/miniconda3/envs/synapse-spark-kernel/lib/site-packages/py4j/protocol.py:332)         format(target_id, ".", name, value))

<------ I OMITTED THIS DUE TO LENGTH ---->

Py4JJavaError: An error occurred while calling o32.sql.
: java.io.InvalidClassException: failed to read class descriptor
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1979)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.QueryContext
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:758)
    at org.apache.spark.sql.lighter.utils.LighterServerObjectInputStream.classForName(LighterServerObjectInputStream.scala:122)
    at org.apache.spark.sql.lighter.utils.LighterServerObjectInputStream.readResolveClassDescriptor(LighterServerObjectInputStream.scala:102)
    at org.apache.spark.sql.lighter.utils.LighterServerObjectInputStream.readClassDescriptor(LighterServerObjectInputStream.scala:97)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1977)
    ... 124 more
PotatoLu666 commented 10 months ago

Hi @jhomiak, thanks for your feedback!

This issue might be related to selecting the wrong runtime version. If you are still experiencing this issue, we need your assistance with the following steps to resolve it.

If you are still unable to run it, please attach the PySparkLighter.log and SparkLighter.log, and we can conduct further investigation. These files are located in the following path: workFolder\workspaceId\logs\artifactId.