wind91725 / gpt2-ml-finetune-

根据gpt2-ml中文模型finetune自己的数据集
Apache License 2.0
43 stars 15 forks source link

【讨论】gpt2-ml,30G,22w步模型微调报错解决方案 #31

Open NLPIG opened 3 years ago

NLPIG commented 3 years ago

tensorflow2.x一直报错,因为 'contrib'在2.x中已经删除,降级成1.x(1.14、1.15)能运行, 开始训练后会出现一堆warring:

【Start trainning............................................. WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass _constraint arguments to layers. W0616 06:45:01.207822 140262356580224 deprecation.py:506] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass _constraint arguments to layers. WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. W0616 06:45:01.208348 140262356580224 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. WARNING:tensorflow:From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:63: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic. W0616 06:45:01.223439 140262356580224 deprecation.py:323] From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:63: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic. WARNING:tensorflow:From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:81: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version. 】

开始循环训练之后会出现致命错误:

【ERROR:tensorflow:Error recorded from training_loop: module 'tensorflow._api.v1.compat.v1' has no attribute 'contrib' E0616 06:45:48.644567 140262356580224 error_handling.py:75] Error recorded from training_loop: module 'tensorflow._api.v1.compat.v1' has no attribute 'contrib' INFO:tensorflow:training_loop marked as finished】

Google了一圈没有找到解决办法,我猜最大的问题出现在 'tensorflow._api.v1.compat.v1' has no attribute 'contrib'上,估计修改API就好?但是找不到这段代码在哪里。 小白一个,还请大佬指点迷津。

NLPIG commented 3 years ago

环境是colab pro,已经执行!pip uninstall -y tensorflow和install tensorflow==1.15.2,python是3.7