hankcs / multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation
http://www.hankcs.com/nlp/segment/multi-criteria-cws.html
GNU General Public License v3.0
300 stars 84 forks source link

报错:RuntimeError: CPU memory allocation failed #2

Open starevelyn opened 6 years ago

starevelyn commented 6 years ago

root@liangzhiNLP:/home/liangzhi/liangxingzheng/multi-criteria-cws/multi-criteria-cws# ./script/train.sh joint-10in1 --dynet-seed 10364 --python-seed 840868838938890892 [dynet] random seed: 10364 [dynet] allocating memory: 512MB [dynet] memory allocation done. model.py --dataset dataset/joint-10in1/dataset.pkl --num-epochs 60 --word-embeddings data/embedding/character.vec --log-dir result/joint-10in1 --dropout 0.2 --learning-rate 0.01 --learning-rate-decay 0.9 --hidden-dim 100 --dynet-seed 22059 --bigram --skip-dev --dynet-seed 10364 --python-seed 840868838938890892

Namespace(always_model=False, batch_size=20, bigram=True, char_embedding_dim=100, char_embeddings=None, char_hidden_dim=100, clip_norm=None, dataset='dataset/joint-10in1/dataset.pkl', debug=False, dropout=0.2, dynet_autobatch=None, dynet_gpus=None, dynet_mem=None, dynet_seed=10364, dynet_weight_decay=None, hidden_dim=100, learning_rate=0.01, learning_rate_decay=0.9, log_dir='result/joint-10in1', lowercase_words=False, lstm_layers=1, no_model=False, no_we=False, no_we_update=False, num_epochs=60, old_model=None, python_seed=840868838938890892, skip_dev=True, subset=None, task_name='2018-01-04-15-01-54', test=False, tie_two_embeddings=False, use_char_rnn=False, word_embeddings='data/embedding/character.vec') Python random seed: 840868838938890892

Memory pool info for each devices: Device CPU - FOR Memory 128MB, BACK Memory 128MB, PARAM Memory 128MB, SCRATCH Memory 128MB. CPU memory allocation failed n=570425344 align=32 Traceback (most recent call last): File "model.py", line 492, in tie_two_embeddings=options.tie_two_embeddings File "model.py", line 56, in init self.bigram_lookup = self.model.add_lookup_parameters((len(b2i), word_embedding_dim)) File "_dynet.pyx", line 1183, in _dynet.ParameterCollection.add_lookup_parameters File "_dynet.pyx", line 1210, in _dynet.ParameterCollection.add_lookup_parameters RuntimeError: CPU memory allocation failed

这个错误是什么原因呀?要改代码吗还是环境问题。。

hankcs commented 6 years ago

不清楚,看上去像是内存不够,试试更大内存的机器。我的实验环境是8个G。

starevelyn commented 6 years ago

确实是内存问题,多谢啦!

starevelyn commented 6 years ago

Python random seed: 840868838938890892 Training Algorithm: <class '_dynet.MomentumSGDTrainer'> Number training instances: 2533999 Number dev instances: 262929 Epoch 1 out of 60 Traceback (most recent call last): File "model.py", line 528, in loss_expr = model.neg_log_loss(instance.sentence, instance.tags) File "model.py", line 192, in neg_log_loss forward_score = self.forward(observations) File "model.py", line 212, in forward alphas_t.append(log_sum_exp(next_tag_expr)) File "model.py", line 202, in log_sum_exp return max_score_expr + dy.log(dy.sum_cols(dy.transpose(dy.exp(scores - max_score_expr_broadcast)))) AttributeError: module 'dynet' has no attribute 'sum_cols' 加了内存条又出现了这个问题。。

hankcs commented 6 years ago

Dynet版本号不匹配,必须是2.0.1:https://github.com/clab/dynet/releases/tag/2.0.1

starevelyn commented 6 years ago

`AttributeError: module 'dynet' has no attribute 'sum_cols' starevelyn@starevelyn-OptiPlex-7020:~/multi-criteria-cws$ pip3 list apturl (0.5.2) beautifulsoup4 (4.4.1) blinker (1.3) Brlapi (0.6.4) chardet (2.3.0) checkbox-support (0.22) cmake (0.9.0) command-not-found (0.3) cryptography (1.2.3) Cython (0.27.3) defer (1.0.6) dyNET (2.0.2) 还是存在这个错误。。

starevelyn@starevelyn-OptiPlex-7020:~/multi-criteria-cws$ python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

import dynet [dynet] random seed: 3213626540 [dynet] allocating memory: 512MB [dynet] memory allocation done. dynet.sum_cols() Traceback (most recent call last): File "", line 1, in AttributeError: module 'dynet' has no attribute 'sum_cols' ` 这个是什么情况啊?

hankcs commented 6 years ago

Dynet版本号不匹配,必须是2.0.1:https://github.com/clab/dynet/releases/tag/2.0.1

而你安装的是dyNET (2.0.2)

starevelyn commented 6 years ago

我又重新按前面提问那个命令安装了一下,这次应该按对了,但是还是报错了。。 liangzhi@liangzhiNLP:~$ python3 Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import dynet [dynet] random seed: 1811970545 [dynet] allocating memory: 512MB [dynet] memory allocation done. dynet.version 2.0 错误是这个 Python random seed: 840868838938890892 Training Algorithm: <class '_dynet.MomentumSGDTrainer'> Number training instances: 2533999 Number dev instances: 262929 Epoch 1 out of 60 126700/126700 [==============================] - 7497s - train loss: 1.0665
Traceback (most recent call last): File "model.py", line 549, in trainer.learning_rate *= options.learning_rate_decay AttributeError: '_dynet.MomentumSGDTrainer' object has no attribute 'learning_rate'

然后我找了一下2.0版本里面确实没有learning_rate这个属性啊??

hankcs commented 6 years ago

感谢反馈,抱歉我提供了错误的版本号,正确的版本号应该是https://github.com/clab/dynet/releases/tag/2.0.1 ,已经反复验证过了。

当时由于从源码编译安装的Dynet版本号只显示dyNET (0.0.0),而论文试验是8月份开始的,所以按照Dynet的发布日志猜测是v2.0。安装后果然可以启动,但每个epoch会出现找不到learning_rate的问题。现在从git commit hash(87df34103625102493f8c660684146a636e2482c)看,应该属于2.0到2.0.1之间的一个版本。通过反复验证,发现2.0.1可以正常运行。

麻烦按照:https://github.com/hankcs/multi-criteria-cws/issues/1#issuecomment-351279371 重新安装2.0.1,谢谢。