wey-gu / nebula-up

One-liner NebulaGraph playground with allllllllll-in-one toolchain integrated on single Linux Server
https://siwei.io/nebula-up
63 stars 16 forks source link

Error while reading Pyspark dataframe #68

Closed raghavchalapathy closed 1 year ago

raghavchalapathy commented 1 year ago

df = spark.read.format( "com.vesoft.nebula.connector.NebulaDataSource").option( "type", "vertex").option( "spaceName", "basketballplayer").option( "label", "cases").option( "returnCols", "*").option( "metaAddress", "metad0:9559").option( "partitionNumber", 1).option( "user", "root").option( "passwd", "nebula").option( "operateType", "read").load()

Getting error ETAG_NOT FOUND

df = spark.read.format( ... "com.vesoft.nebula.connector.NebulaDataSource").option( ... "type", "vertex").option( ... "spaceName", "basketballplayer").option( ... "label", "cases").option( ... "returnCols", "").option( ... "metaAddress", "metad0:9559").option( ... "partitionNumber", 1).option( ... "user", "root").option( ... "passwd", "nebula").option( ... "operateType", "read").load() 23/10/24 21:51:22 ERROR MetaClient: Get tag execute failed, errorCode: E_TAG_NOT_FOUND Traceback (most recent call last): File "", line 11, in File "/spark/python/pyspark/sql/readwriter.py", line 172, in load return self._df(self._jreader.load()) File "/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/spark/python/pyspark/sql/utils.py", line 63, in deco return f(a, **kw) File "/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o281.load. : com.vesoft.nebula.client.meta.exception.ExecuteFailedException: Execute failed: Get tag execute failed, errorCode: E_TAG_NOT_FOUND at com.vesoft.nebula.client.meta.MetaClient.getTag(MetaClient.java:331) at com.vesoft.nebula.connector.nebula.MetaProvider.getTag(MetaProvider.scala:79) at com.vesoft.nebula.connector.NebulaUtils$.getSchema(NebulaUtils.scala:171) at com.vesoft.nebula.connector.reader.NebulaSourceReader.readSchema(NebulaSourceReader.scala:28) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:175) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:204) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

raghavchalapathy commented 1 year ago

Solved !! I was using wrong label "label", "player"

raghavchalapathy commented 1 year ago

The correct label is "player" instead of cases and code which works is df = spark.read.format( "com.vesoft.nebula.connector.NebulaDataSource").option( "type", "vertex").option( "spaceName", "basketballplayer").option( "label", "player").option( "returnCols", "name,age").option( "metaAddress", "metad0:9559").option( "partitionNumber", 1).option( "user", "root").option( "passwd", "nebula").option( "operateType", "read").load()

wey-gu commented 1 year ago

Thanks @raghavchalapathy !