neo4j-contrib / neo4j-mazerunner

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.
Apache License 2.0
381 stars 105 forks source link

Method of programmatically determining when values have been persisted back to Neo4J #37

Open gitsome opened 9 years ago

gitsome commented 9 years ago

I am using NodeJS to trigger the pagerank algorithm. I understand once I end up with a ton of nodes and edges that it could take some time, but I wanted to have an automated way of determining when that process is complete so I can start additional logic that depends on the Neo4J nodes being updated with the values calculated by Mazerunner.

You had mentioned that I could check the log file. I think I can make that work using Node, to simply tail that file and scan the output.

Is there another way? Perhaps the original http call to trigger the job could return a jobId and then I can poll another http endpoint with that jobId for status/progress? I'm probably misusing the project anyway :-), but figured I would ask.

Here is my code:

http.get(CONF.neo4J.paths.pageRank, function(response) {

            var body = '';

            response.on('data', function(d) {
                body += d;
            });

            response.on('end', function() {

                var parsed = JSON.parse(body);
                if(parsed.result && parsed.result === "success") {

                    jobTaskInstance.complete({success:'awesome'});

                } else {

                    jobTaskInstance.error('Unknown error running PageRank');
                }
            });

        }).on('error', function(e) {
            jobTaskInstance.error(e.message);
        });

So maybe the response body could look like?

{"result":"success", "jobId":1234}

Love this project!!!

kbastani commented 9 years ago

Hi @gitsome, thanks for the issue. Sorry it's taken this long to get back to you on this thread. I think it makes sense to do a resource-based REST API that allows you to do simple job management. Thanks for putting that example together, it makes a lot of sense.

I've been putting off job management because there are so many great tools out there that do things like job management and scheduling.

I guess the question for me at this point is whether or not a job scheduler should be the responsibility of a new microservice that Neo4j integrates with or if it should be a part of the Mazerunner extension. That leads to another question which is whether or not Mazerunner should become a platform or continue to be maintained as a simple tool dedicated to solving a well defined problem.

Open source is hard. :)

Should we make this thing a full framework and potentially a cloud-native platform, or should we keep it a simple tool?

gitsome commented 9 years ago

No worries, thanks for taking a look at this. I have noticed you have been a busy man these days! Seems you are mixed up in some exciting things (all of which I'm trying to keep up with).

I wouldn't put too much emphasis on my opinions, I have mostly front-end experience, but from my perspective it seems this should continue to be a simple tool. That is how I am using it, however, I do need to know when this tool has completed it's task. So if I simply need to monitor logs, then no problem, perhaps just highlighting documentation on how to do that would be sufficient?

Regardless thanks again for this great work! Providing these types of tools really has enhanced my own projects!