Closed Tsmith5151 closed 7 years ago
Hi @Tsmith5151. Here are a couple suggestions
@henryre thanks for the feedback. One other question related to this, I have a json file that has been annotated using CoreNLP (tokenize/ssplit/pos/lemma/depparse/ner) -- is there a way I can import this file directly into the sqlite.db through snorkel to maintain the same db schema, or will this needed to be replicated?
Hi @Tsmith5151. The Snorkel parser loads responses from the CoreNLP server in json format here. You can modify the parse
method to take in the file contents as content
rather than requesting it from the server.
Hi @henryre thanks, worked perfectly! One other quick question....running across the following error when training the generative model with the learning functions and estimating their accuracy. The following error occurs when calling NumbSkull RuntimeError: cannot cache function 'gibbsthread': no locator available for file '/anaconda3/lib/python3.5/site-packages/numbskull-0.0-py3.5.egg/numbskull/inference.py'
...suggestions?
Hey @Tsmith5151, this is probably a Python 2/3 compatibility issue related to Numba. If you're able to run your pipeline using Python 2, I'd give that a try.
@henryre -- encountering the following error as shown below; the failure to make a connection to CoreNLP is due to running snorkel on a distributed cluster. For preprocessing/tokenizing/tagging a corpus, is NLTK a suggested workaround here?