graphaware / neo4j-nlp

NLP Capabilities in Neo4j
https://hume.graphaware.com/
335 stars 82 forks source link

Need specific Getting Started documentation #104

Closed colingoldberg closed 3 years ago

colingoldberg commented 6 years ago

As a newcomer both to Neo4j and nlp, I need clearer documentation than you have provided so far. A specific, line by line example, where I could just copy-paste to make it work, would be very helpful. The current Readme is a little cryptic.

Regards

Colin Goldberg

fotisz commented 6 years ago

I found the following video really helpful when starting up: https://www.youtube.com/watch?v=xcpo7BrJIv8

might be worth having a look ...

colingoldberg commented 6 years ago

Thanks - that's an interesting video.

I feel a bit stupid, but I cannot seem to get past some syntax errors as I follow the example in Readme.

Step 1: I created the constraints and a News node Step 2: I added the pipeline (copy/paste): CALL ga.nlp.processor.addPipeline({textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor', name: 'customStopWords', processingSteps: {tokenize: true, ner: true, dependency: false}, stopWords: '+,result, all, during', threadNumber: 20})

but I don't really know if this is an appropriate pipeline (ie. with name 'customStopWords'?)

Do I need to set this as the default pipeline?

A specific example, showing the syntax, would be helpful.

I tried: CALL ga.nlp.processor.pipeline.default({"customStopWords"}) gets error: Invalid input '"': expected whitespace, an identifier, UnsignedDecimalInteger, a property key name or '}' (line 1, column 41 (offset: 40)) "CALL ga.nlp.processor.pipeline.default({"customStopWords"})"

CALL ga.nlp.processor.pipeline.default({name: "customStopWords"}) gets error: Type mismatch: expected String but was Map (line 1, column 48 (offset: 47)) "call ga.nlp.processor.pipeline.default({name: "customStopWords"})"

And if I try the annotate call, without setting the pipeline, of course it lets me know:

MATCH (n:News) CALL ga.nlp.annotate({text: n.text, id: id(n)}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result) RETURN result

Failed to invoke procedure ga.nlp.annotate: Caused by: java.lang.RuntimeException: A pipeline should be given or set as default.

Perhaps I am tired - it's the end of the day. It's my fault for wanting to see a result - even before I read the full documentation (!!!)

Any help is appreciated. A blog post or tutorial with copy/pastable lines would be helpful. GraphAware looks like a very strong tool.

Colin Goldberg

fotisz commented 6 years ago

which versions of the plugins are you using?

colingoldberg commented 6 years ago

Plugins directory listing (Mac): -rw-r--r--@ 1 colingoldberg staff 60642875 Jul 13 13:54 graphaware-nlp-3.4.0.52.12-SNAPSHOT.jar -rw-r--r--@ 1 colingoldberg staff 13965997 Jun 21 10:34 graphaware-server-enterprise-all-3.4.0.52.jar -rw-r--r--@ 1 colingoldberg staff 6534649 Jul 16 16:11 neosemantics-3.4.0.1.jar -rw-r--r--@ 1 colingoldberg staff 378126190 Jun 21 15:23 nlp-stanfordnlp-3.4.0.52.11.jar

conf/neo4j.conf: ...

****

Other Neo4j system properties

****

dbms.jvm.additional=-Dunsupported.dbms.udc.source=desktop dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware com.graphaware.runtime.enabled=true com.graphaware.module.NLP.1=com.graphaware.nlp.module.NLPBootstrapper dbms.security.procedures.whitelist=ga.nlp.,semantics. dbms.unmanaged_extension_classes=semantics.extension=/rdf

fotisz commented 6 years ago

can you try CALL ga.nlp.processor.pipeline.default('customStopWords')

single quotes, no curly brackets with the name of your pipeline (assumed customStopWords here)

colingoldberg commented 6 years ago

Step 1: CALL ga.nlp.processor.pipeline.default('customStopWords') returned SUCCESS

I had added the pipeline the other day - although I have yet to understand more about the pipelines available, how to set one up correctly, etc.

Step2: MATCH (n:News) CALL ga.nlp.annotate({text: n.text, id: id(n)}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result) RETURN result

After a minute or so it gave an Out of Memory error. I will try it later after restarting my mac, and also on another server.

Can you point to the documentation for this (or create it if it does not exist.

Thanks.

fotisz commented 6 years ago

you can find everything on https://github.com/graphaware/neo4j-nlp/blob/master/README.md - you might need more time to get familiar with the Cypher syntax though, it took me a while

colingoldberg commented 6 years ago

I tried to run the following:

MATCH (n:News) CALL ga.nlp.annotate({text: n.text, id: id(n)}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result) RETURN result

but after a minute or so (again), it gave an error: Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure ga.nlp.annotate: Caused by: java.lang.OutOfMemoryError: Java heap space

Running on macosx (10.13.6) with 16GB of memory. How much does it need?

alenegro81 commented 6 years ago

you need to change the parameters in the configuration file:

dbms.memory.heap.initial_size=5g dbms.memory.heap.max_size=5g

eayan commented 6 years ago

Hi, I got a similar error. How did you solve that problem?

Cypher: CALL ga.nlp.processor.addPipeline({textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor', name: 'customStopWords', processingSteps: {tokenize: true, ner: true, dependency: false}, stopWords: '+,result, all, during', threadNumber: 20})

Error: Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure ga.nlp.processor.addPipeline: Caused by: java.lang.RuntimeException: Processor with name 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor' does not exist

colingoldberg commented 6 years ago

I changed the configuration parameters -initial_size & max_size, as indicated above, and the annotate procedure then ran.

eayan commented 6 years ago

@colingoldberg thanks it has worked now. best..