henridf / apache-spark-node

Node.js bindings for Apache Spark DataFrame APIs
https://henridf.github.io/apache-spark-node
Apache License 2.0
143 stars 14 forks source link

Querying a cassandra DB via spark #38

Open Enilia opened 8 years ago

Enilia commented 8 years ago

Hey there,

As the title says, i am trying to query an existing cassandra DB from nodejs using your library. I am using a spark cluster on a LAN

Here's what i have done so far : using :

From the root of my project :

ASSEMBLY_JAR=/usr/share/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar node_modules/apache-spark-node/bin/spark-node \
--master spark://192.168.1.101:7077 --conf spark.cores.max=4 \
--jars /root/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.6.0-M1-36-g220aa37.jar

Once i have access to the command line i tried to do

spark-node> sqlContext.sql("Select count(*) from mykeyspace.mytable")

but of course i get a

Error: Error creating class
org.apache.spark.sql.AnalysisException: Table not found: `mykeyspace`.`mytable`;

i then tried to adapt a snippet of scala i've seen on a stack overflow post

var df = sqlContext
  .read()
  .format("org.apache.spark.sql.cassandra")
  .option("table", "mytable")
  .option("keyspace", "mykeyspace")
  .load(null, function(err, res) { console.log(err); console.log(res) }) 

but all i get is a

Error: Error running instance method
java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org

The problem surely comes from the fact that i don't understand half of how everything is linked together, that's why i'm here asking for some help about this issue. All i need is a way to execute basic sql functions (with only WHERE clauses) over one cassandra table.

I recon this project seems no longer maintained, but this is as far as i can see the simpler solution i have seen so far (solutions like eclairJS have way more functionalities than i need, at the cost of an increased complexity and maybe less performance) and it would just fill my needs.

tobilg commented 8 years ago

You should post your complete code. According to the docs you need to set up the SparkContext with the right configuration properties.

Furthermore, there is an example on how to use SparkSQL.

Basically, this is not an issue of apache-spark-node and should be closed accordingly.

henridf commented 8 years ago

Hi @Enilia - as @tobilg answers this doesn't appear to be an issue but if we're missing something please post a more complete description and I'll do my best to help. (This project is still maintained btw.)

Enilia commented 8 years ago

Hi and thanks for the quick reply,

I'm sorry if i thought the project was not maintained anymore, i got this impression from the low activity of the repo in the last few months :s . Anyway, i'm glad you're still active on this project. I'll get a look at the links @tobilg gave here and post a more complete issue if there's something missing. I'm still new in the cassandra/spark/java/scala universe, so i'm a bit lost here tbh ^^

Best regards, Eni