Closed PolarisRisingWar closed 10 months ago
不知道你的显存有多大,如果不行的话,我建议你使用tensorflow的Dataloader处理数据,而不是一次性把所有的数据吃进显存。这个模型本身是非常小的。 你也可以关注我们github上后续发布的tensorflow 2.x的版本,这个版本会随着我们的期刊工作一起发布。
@.***
发件人: PolarisRisingWar
发送时间: 2022-12-12 15:08
收件人: prometheusXN/LADAN
抄送: Subscribed
主题: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12)
我在运行LADAN+MTL_larg.py时,在运行20小时后还没有跑出一个epoch的结果,而且还报了OOM。
我batch size已经改得很小了,想问问影响显存占用量的还会有什么其他因素吗?我不常用TensorFlow,遇到类似情况能有什么办法来快速debug吗?
我的报错信息大概是这样:
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
WARNING:tensorflow:From LADAN+MTL_large.py:187: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From LADAN+MTL_large.py:240: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.tf.nn.softmax_cross_entropy_with_logits_v2
.
WARNING:tensorflow:From LADAN+MTL_large.py:349: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.math.argmax
instead
WARNING:tensorflow:From LADAN+MTL_large.py:376: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.
WARNING:tensorflow:From LADAN+MTL_large.py:412: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:416: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:420: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From LADAN+MTL_large.py:429: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
2022-12-08 16:40:23.383015: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-12-08 16:40:23.426005: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2022-12-08 16:40:23.430214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f54978e0 executing computations on platform Host. Devices:
2022-12-08 16:40:23.430288: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
我对原代码的修改应该不多。原GitHub项目中缺失的law_label2index_large.pkl文件,我是根据CAIL-big数据集预处理后得到的new_law.txt经类似如下的操作得到的: import argparse import pickle as pk parser = argparse.ArgumentParser() parser.add_argument('--law_file') parser.add_argument('--output_file') args = parser.parse_args() k={} with open(args.law_file) as f: l=f.readlines() for i in range(len(l)): item=l[i] k[item.strip()]=i pk.dump(k,open(args.output_file,'wb')) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
理论上来说,只要你能训练small数据,在同样的batch_size的设置下,large数据也是可以训练的,所以你可能需要关注是不是数据对显存的占用太多了
@.***
发件人: @.*** 发送时间: 2022-12-13 11:10 收件人: prometheusXN/LADAN 主题: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12)
不知道你的显存有多大,如果不行的话,我建议你使用tensorflow的Dataloader处理数据,而不是一次性把所有的数据吃进显存。这个模型本身是非常小的。 你也可以关注我们github上后续发布的tensorflow 2.x的版本,这个版本会随着我们的期刊工作一起发布。
@.***
发件人: PolarisRisingWar
发送时间: 2022-12-12 15:08
收件人: prometheusXN/LADAN
抄送: Subscribed
主题: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12)
我在运行LADAN+MTL_larg.py时,在运行20小时后还没有跑出一个epoch的结果,而且还报了OOM。
我batch size已经改得很小了,想问问影响显存占用量的还会有什么其他因素吗?我不常用TensorFlow,遇到类似情况能有什么办法来快速debug吗?
我的报错信息大概是这样:
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
WARNING:tensorflow:From LADAN+MTL_large.py:187: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From LADAN+MTL_large.py:240: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.tf.nn.softmax_cross_entropy_with_logits_v2
.
WARNING:tensorflow:From LADAN+MTL_large.py:349: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.math.argmax
instead
WARNING:tensorflow:From LADAN+MTL_large.py:376: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.
WARNING:tensorflow:From LADAN+MTL_large.py:412: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:416: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:420: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From LADAN+MTL_large.py:429: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
2022-12-08 16:40:23.383015: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-12-08 16:40:23.426005: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2022-12-08 16:40:23.430214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f54978e0 executing computations on platform Host. Devices:
2022-12-08 16:40:23.430288: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
我对原代码的修改应该不多。原GitHub项目中缺失的law_label2index_large.pkl文件,我是根据CAIL-big数据集预处理后得到的new_law.txt经类似如下的操作得到的: import argparse import pickle as pk parser = argparse.ArgumentParser() parser.add_argument('--law_file') parser.add_argument('--output_file') args = parser.parse_args() k={} with open(args.law_file) as f: l=f.readlines() for i in range(len(l)): item=l[i] k[item.strip()]=i pk.dump(k,open(args.output_file,'wb')) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
如果仍然OOM的话,你就需要关注你修改部分是否有循环定义placeholder的情况,如果你在tensorflow的计算图上有额外placeholder的产生(一般是for循环导致的),那么你的程序在运行几个step之后,由于显存的不断占用(新定义的placeholder时,tensorflow并不会释放已经不参与计算的placeholder的显存占用),同样会导致OOM,所以你可以自查一下
@.***
发件人: @.*** 发送时间: 2022-12-13 11:12 收件人: prometheusXN/LADAN 主题: Re: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12) 理论上来说,只要你能训练small数据,在同样的batch_size的设置下,large数据也是可以训练的,所以你可能需要关注是不是数据对显存的占用太多了
@.***
发件人: @.*** 发送时间: 2022-12-13 11:10 收件人: prometheusXN/LADAN 主题: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12)
不知道你的显存有多大,如果不行的话,我建议你使用tensorflow的Dataloader处理数据,而不是一次性把所有的数据吃进显存。这个模型本身是非常小的。 你也可以关注我们github上后续发布的tensorflow 2.x的版本,这个版本会随着我们的期刊工作一起发布。
@.***
发件人: PolarisRisingWar
发送时间: 2022-12-12 15:08
收件人: prometheusXN/LADAN
抄送: Subscribed
主题: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12)
我在运行LADAN+MTL_larg.py时,在运行20小时后还没有跑出一个epoch的结果,而且还报了OOM。
我batch size已经改得很小了,想问问影响显存占用量的还会有什么其他因素吗?我不常用TensorFlow,遇到类似情况能有什么办法来快速debug吗?
我的报错信息大概是这样:
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
WARNING:tensorflow:From LADAN+MTL_large.py:187: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From LADAN+MTL_large.py:240: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.tf.nn.softmax_cross_entropy_with_logits_v2
.
WARNING:tensorflow:From LADAN+MTL_large.py:349: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.math.argmax
instead
WARNING:tensorflow:From LADAN+MTL_large.py:376: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.
WARNING:tensorflow:From LADAN+MTL_large.py:412: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:416: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
WARNING:tensorflow:From LADAN+MTL_large.py:420: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From LADAN+MTL_large.py:429: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
2022-12-08 16:40:23.383015: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-12-08 16:40:23.426005: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2022-12-08 16:40:23.430214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f54978e0 executing computations on platform Host. Devices:
2022-12-08 16:40:23.430288: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
我对原代码的修改应该不多。原GitHub项目中缺失的law_label2index_large.pkl文件,我是根据CAIL-big数据集预处理后得到的new_law.txt经类似如下的操作得到的: import argparse import pickle as pk parser = argparse.ArgumentParser() parser.add_argument('--law_file') parser.add_argument('--output_file') args = parser.parse_args() k={} with open(args.law_file) as f: l=f.readlines() for i in range(len(l)): item=l[i] k[item.strip()]=i pk.dump(k,open(args.output_file,'wb')) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
好的……我现在对TensorFlow的操作几乎完全不熟,我可能需要晚些再看看怎么调bug。 我这边的GPU单卡是15109MiB。
请问TensorFlow 2.x版本公布了吗?
tensorflow 2.x的版本是有的,我们目前转投期刊的时候,用tensorflow2.x重构了LADAN,只是暂时还没有时间整理并发布出来,稍后会发布的。
@.***
发件人: PolarisRisingWar 发送时间: 2023-05-15 13:21 收件人: prometheusXN/LADAN 抄送: prometheusXN; Comment 主题: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12) 请问TensorFlow 2.x版本公布了吗? ― Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我这两天尽量整理并开源出来。
@.***
发件人: PolarisRisingWar 发送时间: 2023-05-15 13:21 收件人: prometheusXN/LADAN 抄送: prometheusXN; Comment 主题: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12) 请问TensorFlow 2.x版本公布了吗? ― Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我们目前已经开源了一个简易的版本,你可以在这个链接(https://github.com/prometheusXN/D-LADAN)下找到我们tf2.x版本的LADAN模型。
@.***
发件人: PolarisRisingWar 发送时间: 2023-05-15 13:21 收件人: prometheusXN/LADAN 抄送: prometheusXN; Comment 主题: Re: [prometheusXN/LADAN] 运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题 (Issue #12) 请问TensorFlow 2.x版本公布了吗? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我在运行LADAN+MTL_larg.py时,在运行20小时后还没有跑出一个epoch的结果,而且还报了OOM。 我batch size已经改得很小了,想问问影响显存占用量的还会有什么其他因素吗?我不常用TensorFlow,遇到类似情况能有什么办法来快速debug吗?
我的报错信息大概是这样:
我对原代码的修改应该不多。原GitHub项目中缺失的law_label2index_large.pkl文件,我是根据CAIL-big数据集预处理后得到的new_law.txt经类似如下的操作得到的: