sinjax / squall

An implementation of Rete for Jena Rules and SPARQL on the Storm distributed stream processing framework
3 stars 1 forks source link

installation problem #2

Open stmeissner opened 10 years ago

stmeissner commented 10 years ago

Hi guys, many thanks for your work, I find this very interesting.

I tried to follow your instructions to install the software, but I got a maven error at the second step: mvn install tools/ReteStormTool/retestorm

maven complains about an: Unknown lifecycle phase "tools/ReteStormTool/retestorm"

and thus all the remaining steps in the Reactor Build Order are skipped. any idea what I'm doing wrong, please?

sinjax commented 10 years ago

Heya Stefan,

Right so the instructions were not mega clear and should read:

mvn install
./tools/ReteStormTool/retestorm

See? so after the mvn install you're literally running a command which is the tool itself

Just so you know. This version of the tool work and can be used to deploy local and storm stream processing for both Jena and SPARQL. However our current work is on a new version of the tool which has a much more modular and easy to extend backend.

Cheers

stmeissner commented 10 years ago

Hi sina,

thanks for your response. Now Maven went through almost everything successfully, there was only one error during compilation of RunLSBenchJenaQuery.java in the sandbox package.

i'm able to execute the retestorm script, but despite the help I still have no idea what to put in the arguments there. Is there any example you can point me to, just to check the thingy works as expected?

cheers Stefan

sinjax commented 10 years ago

Hi Stefan,

Sorry for this wall of text. I'm about to go through how to use this tool (ReteStormTool), explain why it is an old version and finally describe the new version (SquallTool). I hope you find this useful.

If you run the tool you should get a help message saying how it is used But I accept the tool isn't as nice or as obvious as it could be so I'll walk you through this mode of operation:

so you are in a place where you can run the retestorm bash file and you run:

./tools/ReteStormTool/retestorm

Now the first time it runs this will take a long time because the tool actually calls a maven command in order to build a dependency tree. This is not how things should happen in the future, but the tool is very much at development stage. If it looks too unresponsive you can run the command it is running manually, and once you run this once it should run quickly next time, the command it calls is:

cd tools/ReteStormTool
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt -Dsilent=true 

Ok great, so the tool is now running and you get a bunch of help commands, but this is mega confusing so let me just show you one demo and explain what it does:

./retestorm -tm LOCAL -st 20000 -rl SPARQL -i test.csparql -us file://pwd/test.ntriples

Alright! So first the easy bits. -tm says "Perform this in local cluster mode" as opposed to -tm STORM which plays with a running storm cluster. -st says "If you're in local cluster mode stay awake for this long". Obviously this isn't something you'd do on a normal storm deployment, as the tool would return immediately and you'd just shut it down from storm if you wanted to shut it down, but in a local mode the tool waits...and then dies...

Then we have -rl SPARQL which says... use sparql... cool

now the query being performed is test.csparql, you can download that here: https://gist.github.com/sinjax/8705052

finally we have -us which is just a hack for testing the tool which preloads a kestrel queue with some triples. Again in normal operation you'd probably have some other process feeding your input queue, but for this test we're preloading our kestrel queue with some data, that data is here: https://gist.github.com/sinjax/8705111

if you do this the expected output on the kestrel output queue looks like this: https://gist.github.com/sinjax/8705159

Ok great. But let's scratch at some of the nuances. Firstly you might notice that the only way this particular tool allows for input is using kestrel queues. In fact the command I've demoed uses the default kestrel queue server (127.0.0.1) and expects inputs from inputQueue and throws outputs onto outputQueue

To make this demo work you'll need to set up a kestrel server somewhere and point the tool at it

Now as you can see this tool, though useful for testing the SPARQL functionality, is actually rather limited. In fact I only use it for testing whether basic stuff still works. With this in mind, if you want to use this version of squall, i.e. the version that supports SPARQL and rete, then I would suggest you program the pipeline yourself in java, rather than using a tool. If you just look at the flow the tool follows in the modes of this demo, it should be relatively easy to see how you could do that.

However, even then the old version has many problems.

... and it is all just a bit very v0.1, know what i mean?The hard coding of kestrel queues, the reliance on storm, all this stuff is being actively worked on right now.

There is indeed a new backend to squall being actively developed right now. It is far more modular and has no direct relation to kestrel or storm or anything. It is nicer in many ways and can be found here:

cd tools/SquallTool

HOWEVER. This version of the tool (though far more modular and nice both in the frontend and backend) major drawbacks. Most importantly for you SPARQL is not currently supported, only jena rules and RIF rules.

But it has been built so SPARQL can be added as a module very easily, and this is planned in the next few months.

As you can see squall is very development and very experimental right now. But if I haven't scared you away, your input and your interest is greatly appreciated :)