shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
https://www.mulanai.com/product/corrector/
Apache License 2.0
5.61k stars 1.1k forks source link

seq2seq MemoryError #62

Closed ryangawei closed 5 years ago

ryangawei commented 5 years ago

仅使用CGED17训练的时候,报错:

Traceback (most recent call last):
  File "D:/Github/pycorrector/pycorrector/seq2seq/train.py", line 105, in <module>
    rnn_hidden_dim=config.rnn_hidden_dim)
  File "D:/Github/pycorrector/pycorrector/seq2seq/train.py", line 52, in train
    encoder_input_data = np.zeros((len(input_texts), max_input_texts_len, len(input_token_index)), dtype='float32')

按理说数据集只有3w内存应该不会爆啊,请问有没有什么解决方案

shibing624 commented 5 years ago

适当修改参数,max_input_texts_len, word_emb_size, word_size 改小点,都有帮助。

liuhuihuii commented 5 years ago

the same question,具体数值有什么建议吗,谢谢

shibing624 commented 5 years ago

按推荐的默认参数,我本地都测试过。