Open minhlab opened 6 years ago
I'm having the same issues..
Still an issue in the current version 4.0, I will try to fix this.
Similar issue with directly pickling the doc.
In [1]: import spacy
In [2]: import neuralcoref
In [3]: nlp = spacy.load('en_core_web_sm')
In [4]: neuralcoref.add_to_pipe(nlp)
Out[4]: <spacy.lang.en.English at 0x7f8ef8f17dd8>
In [5]: d = nlp("NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolves coref
...: erence clusters using a neural network. NeuralCoref is production-ready, integrated in spaCy's
...: NLP pipeline and extensible to new training datasets.")
In [6]: import pickle
In [7]: with open('test.pt', 'wb') as f:
...: pickle.dump(d, f)
...:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-7-569cbf65bb25> in <module>
1 with open('test.pt', 'wb') as f:
----> 2 pickle.dump(d, f)
3
doc.pyx in spacy.tokens.doc.pickle_doc()
~/anaconda3/envs/rcqa/lib/python3.7/site-packages/srsly/_pickle_api.py in pickle_dumps(data, protocol)
12 RETURNS (bytest): The serialized object.
13 """
---> 14 return cloudpickle.dumps(data, protocol=protocol)
15
16
~/anaconda3/envs/rcqa/lib/python3.7/site-packages/srsly/cloudpickle/cloudpickle.py in dumps(obj, protocol)
952 try:
953 cp = CloudPickler(file, protocol=protocol)
--> 954 cp.dump(obj)
955 return file.getvalue()
956 finally:
~/anaconda3/envs/rcqa/lib/python3.7/site-packages/srsly/cloudpickle/cloudpickle.py in dump(self, obj)
282 self.inject_addons()
283 try:
--> 284 return Pickler.dump(self, obj)
285 except RuntimeError as e:
286 if 'recursion' in e.args[0]:
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in dump(self, obj)
435 if self.proto >= 4:
436 self.framer.start_framing()
--> 437 self.save(obj)
438 self.write(STOP)
439 self.framer.end_framing()
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save_tuple(self, obj)
784 write(MARK)
785 for element in obj:
--> 786 save(element)
787
788 if id(obj) in memo:
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save_dict(self, obj)
854
855 self.memoize(obj)
--> 856 self._batch_setitems(obj.items())
857
858 dispatch[dict] = save_dict
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in _batch_setitems(self, items)
880 for k, v in tmp:
881 save(k)
--> 882 save(v)
883 write(SETITEMS)
884 elif n:
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save_list(self, obj)
814
815 self.memoize(obj)
--> 816 self._batch_appends(obj)
817
818 dispatch[list] = save_list
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in _batch_appends(self, items)
841 write(APPENDS)
842 elif n:
--> 843 save(tmp[0])
844 write(APPEND)
845 # else tmp is empty, and we're done
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
547
548 # Save the reduce() output and finally memoize the object
--> 549 self.save_reduce(obj=obj, *rv)
550
551 def persistent_id(self, obj):
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
660
661 if state is not None:
--> 662 save(state)
663 write(BUILD)
664
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save_dict(self, obj)
854
855 self.memoize(obj)
--> 856 self._batch_setitems(obj.items())
857
858 dispatch[dict] = save_dict
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in _batch_setitems(self, items)
880 for k, v in tmp:
881 save(k)
--> 882 save(v)
883 write(SETITEMS)
884 elif n:
~/anaconda3/envs/rcqa/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
522 reduce = getattr(obj, "__reduce_ex__", None)
523 if reduce is not None:
--> 524 rv = reduce(self.proto)
525 else:
526 reduce = getattr(obj, "__reduce__", None)
span.pyx in spacy.tokens.span.Span.__reduce__()
NotImplementedError: [E112] Pickling a span is not supported, because spans are only views of the parent Doc and can't exist on their own. A pickled span would always have to include its Doc and Vocab, which has practically no advantage over pickling the parent Doc directly. So instead of pickling the span, pickle the Doc it belongs to or use Span.as_doc to convert the span to a standalone Doc object.
Similar issue. Python 3.6.8 64bits on Anaconda, Windows 10.
>> neuralcoref.__version__
'4.0.0'
>> nlp = spacy.load('en_core_web_sm')
>> coref = neuralcoref.NeuralCoref(nlp.vocab)
>> nlp.add_pipe(coref, name='neuralcoref')
>> doc = nlp('My sister has a dog. She loves him.')
>> doc.to_disk('test.pkl')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
3 nlp.add_pipe(coref, name='neuralcoref')
4 doc = nlp('My sister has a dog. She loves him.')
----> 5 doc.to_disk('test.pkl')
doc.pyx in spacy.tokens.doc.Doc.to_disk()
doc.pyx in spacy.tokens.doc.Doc.to_disk()
doc.pyx in spacy.tokens.doc.Doc.to_bytes()
~\AppData\Roaming\Python\Python36\site-packages\spacy\util.py in to_bytes(getters, exclude)
580 # Split to support file names like meta.json
581 if key.split(".")[0] not in exclude:
--> 582 serialized[key] = getter()
583 return srsly.msgpack_dumps(serialized)
584
doc.pyx in spacy.tokens.doc.Doc.to_bytes.lambda8()
~\AppData\Roaming\Python\Python36\site-packages\srsly\_msgpack_api.py in msgpack_dumps(data)
14 RETURNS (bytes): The serialized bytes.
15 """
---> 16 return msgpack.dumps(data, use_bin_type=True)
17
18
~\AppData\Roaming\Python\Python36\site-packages\srsly\msgpack\__init__.py in packb(o, **kwargs)
38 Pack an object and return the packed bytes.
39 """
---> 40 return Packer(**kwargs).pack(o)
41
42
_packer.pyx in srsly.msgpack._packer.Packer.pack()
_packer.pyx in srsly.msgpack._packer.Packer.pack()
_packer.pyx in srsly.msgpack._packer.Packer.pack()
_packer.pyx in srsly.msgpack._packer.Packer._pack()
_packer.pyx in srsly.msgpack._packer.Packer._pack()
_packer.pyx in srsly.msgpack._packer.Packer._pack()
TypeError: can not serialize 'Cluster' object
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I also have this issue on the most recent neuralcoref version
I've had this issue too, while trying to call doc_bytes = doc.to_bytes()
I also have this issue.
Error: {TypeError}can not serialize 'spacy.tokens.span.Span' object
Thanks for the reports - will look into this!
I ran into the same issue when running nlp.pipe
with multiple processes:
for doc in nlp.pipe(df.text, batch_size=5, n_process=4):
print(doc)
Since this is actively blocking me, I found a temporary workaround:
def remove_unserializable_results(doc):
doc.user_data = {}
for x in dir(doc._):
if x in ['get', 'set', 'has']: continue
setattr(doc._, x, None)
for token in doc:
for x in dir(token._):
if x in ['get', 'set', 'has']: continue
setattr(token._, x, None)
return doc
nlp.add_pipe(remove_unserializable_results, last=True)
I added this after my last pipeline (i.e. after='coreference_resolver'
) which converted the coreferences into entities so I no longer needed the coref metadata which was unserializable.
Same issue in Databricks with PySpark.
doc.user_data = {}
Can you please provide a more complete example. I use your code snippet but unfortunately I have no access to the coref data.
@dpasch01, the following worked for me in terms of saving at least the string representation of neuralcoref
output.
def remove_unserializable_results(doc):
temp = str(doc._.coref_resolved)
doc.user_data = {}
doc.user_data = {"coref": temp}
for x in dir(doc._):
getattr(doc._, x)
for x in dir(doc._):
if x in ['get', 'set', 'has', 'coref_as_ner']: continue
setattr(doc._, x, None)
for token in doc:
for x in dir(token._):
if x in ['get', 'set', 'has', 'coref_as_ner']: continue
setattr(token._, x, None)
return doc
nlp.add_pipe(remove_unserializable_results, last=True)
Then you can do the usual docs = nlp.pipe(my_list_of_texts)
and get that string with [doc.user_data['coref'] for doc in docs]
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I can save a Spacy document to disk but not one produced by neuralcoref. For example, the following snippet returns error
TypeError: can't serialize My sister: [My sister, She]
.The files produced are as follows: