Open fortepianissimo opened 6 years ago
Okay - disabling lmdb in embedding-registry.json seems to make that exception go away. BUT now there's another exception:
__________________________________________________________________________________________________
Epoch 1/60
d:\Projects\delft\env\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
d:\Projects\delft\env\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "d:\Anaconda3\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "d:\Anaconda3\Lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
File "d:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
[Previous line repeated 328 more times]
Hello! I haven't been able to reproduce the exception in Linux so it might be windows related. I'm trying to get a windows machine in order to try again. In the meanwhile, can you tell us a little more about your set-up? For instance, are you using a GPU? Did you use the requirements-gpu.txt
files to set it up? Also, which version of python are you using?
Thanks!
Hi sorry I wasn't very clear about my spec:
By the way I also solved this error along the way: DLL load failed message when scikit-learn is imported.
The solution is to install numpy‑1.14.6+mkl‑cp36‑cp36m‑win_amd64.whl (depending on the arch and Python version) from https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Ok, we had some problems before with Python 3.6, I honestly don't think that the Python version is the problem, but if you have the time, can you try creating a Python 3.5 environment with conda conda create -n myenv python=3.5
and see if you encounter the same problems? As soon as I get to try DeLFT on Windows I'll get back to you.
Ok I set up Python 3.5 (version 3.5.6 via Anaconda) environment and created another env_python35 under delft dir, here are the errors (infinite recursion):
Epoch 1/60
D:\Projects\delft\env_python35\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
D:\Projects\delft\env_python35\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
return getattr(self.model, name)
File "D:\Projects\delft\utilities\Embeddings.py", line 78, in __getattr__
... (more same lines like the above) ...
RecursionError: maximum recursion depth exceeded while calling a Python object
Exception in thread Thread-1:
Traceback (most recent call last):
File "d:\Anaconda3\envs\python35_env\Lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "d:\Anaconda3\envs\python35_env\Lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "D:\Projects\delft\env_python35\lib\site-packages\keras\utils\data_utils.py", line 548, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "D:\Projects\delft\env_python35\lib\site-packages\keras\utils\data_utils.py", line 522, in <lambda>
initargs=(seqs,))
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\context.py", line 118, in Pool
context=self.get_context())
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\pool.py", line 174, in __init__
self._repopulate_pool()
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
reduction.dump(process_obj, to_child)
File "d:\Anaconda3\envs\python35_env\Lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
Thanks for the info! I have been looking around and apparently the multiprocessing
library works differently on Windows, so this series of errors you are encountering might be caused by that. However I haven't been able to find a Windows machine to test it yet, as soon as I can get hold of one I'll get back to you.
@fortepianissimo I finally got hold of a Windows machine and was able to reproduce the error, could you please comment lines 77 and 78 in the file utilities/Embeddings.py
, that is, these lines:
def __getattr__(self, name):
return getattr(self.model, name)
and try again?
Note 1: Please also disable lmdb in embedding-registry.json
Note 2: This is a workaround rather than a fix, I'll work on a definite fix in the future
Also, please let me know if the workaround works!
Hello, I'm new to this. My specs are:
requirement.txt
]And I want to ask for 2 things:
embedding-registry.json
?utilities/Embeddings.py
but I encountered this problem (NOTE: I even tried to use pickle version 4 but nothing happened):Using TensorFlow backend.
D:\Anaconda3\envs\ULR\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\Anaconda3\envs\ULR\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\ULR\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Edited: For the first question, I've found the answer (set the "embedding-lmdb-path" to "None")
I face the same issue as @Protossnam EOFError: Ran out of input
.
Am on Windows 10 with py3.5. Any updates on this?
@davidlenz Sadly, I had to boot my laptop in Linux (Ubuntu) and run the tool. On Linux, I didn't face that issue. It's maybe the problem with Windows and I also looking forward to hearing new update on this too
Hi all, An easy workaround would be to disable multiprocessing when running on Windows To do that you need to pass multiprocessing=False each time a new Sequence object in created in nerTagger.py
My 2 cts
Olivier
I have this issue when the download fails and the database is not correctly initialised I supposed:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/site-packages/keras/utils/data_utils.py", line 744, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/site-packages/keras/utils/data_utils.py", line 721, in pool_fn
pool = get_pool_class(True)(
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
self._repopulate_pool()
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/lfoppiano/opt/anaconda3/envs/delft2/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object
Update: I tried to run again and the database was correctly created (via a local version of glove), however the problem occurs probably due to multithreading...
To reproduce it I used:
python -m delft.applications.citationClassifier train_eval
Update: I'm having this problem with macOS.
I have the same problem with MacOs.
The solution is to disable the multithreading by setting nb_workers = 0
. Depending on the task to be performed it should modified in both sequenceLabelling/wrapper.py
and trainer.py: 172
.
I'm running under Windows 10, following along the instructions given by the readme document. When trying to retrain the model using this command
python nerTagger.py --dataset-type conll2003 train_eval
I ran into the following exception (right after compiling embeddings) - any tips?
Thank you for the wonderful work!