Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

snorkel-team / snorkel

A system for quickly generating training data with weak supervision

https://snorkel.org

Apache License 2.0

5.81k stars 857 forks source link

Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer #553

Closed Tsmith5151 closed 7 years ago

Tsmith5151 commented 7 years ago

@henryre -- encountering the following error as shown below; the failure to make a connection to CoreNLP is due to running snorkel on a distributed cluster. For preprocessing/tokenizing/tagging a corpus, is NLTK a suggested workaround here?

WARNING:requests.packages.urllib3.connectionpool:Retrying 
(Retry(total=None, connect=19, read=0, redirect=None)) after connection broken

Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

henryre commented 7 years ago

Hi @Tsmith5151. Here are a couple suggestions

You can modify the Snorkel CoreNLP Server setup to connect to the parser at an arbitrary address instead of on localhost. CoreNLP documentation is here.
As you suggested, you can use NLTK. You should be able to use their Stanford parser wrapper to replicate the behavior of snorkel.parser.CoreNLPHandler.parse. If you end up going this route and want to commit your code, feel free to submit a PR!

Tsmith5151 commented 7 years ago

@henryre thanks for the feedback. One other question related to this, I have a json file that has been annotated using CoreNLP (tokenize/ssplit/pos/lemma/depparse/ner) -- is there a way I can import this file directly into the sqlite.db through snorkel to maintain the same db schema, or will this needed to be replicated?

henryre commented 7 years ago

Hi @Tsmith5151. The Snorkel parser loads responses from the CoreNLP server in json format here. You can modify the parse method to take in the file contents as content rather than requesting it from the server.

Tsmith5151 commented 7 years ago

Hi @henryre thanks, worked perfectly! One other quick question....running across the following error when training the generative model with the learning functions and estimating their accuracy. The following error occurs when calling NumbSkull RuntimeError: cannot cache function 'gibbsthread': no locator available for file '/anaconda3/lib/python3.5/site-packages/numbskull-0.0-py3.5.egg/numbskull/inference.py'...suggestions?

henryre commented 7 years ago

Hey @Tsmith5151, this is probably a Python 2/3 compatibility issue related to Numba. If you're able to run your pipeline using Python 2, I'd give that a try.