Closed liangzimei closed 2 years ago
Hi, thanks for your interest in our paper. Could you please use the following command line for pre-training?
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Pretrain.py \
--config ./configs/Pretrain.yaml \
--output_dir output/pretrain \
--text_encoder bert-base-chinese
More details can be found in this line Please let me know if this helps. Thanks.
Sorry, I just realized that you are using bert-base-chinese
.
Can you show me config_bert_chinese.json
? Thanks.
Sorry, I just realized that you are using
bert-base-chinese
. Can you show meconfig_bert_chinese.json
? Thanks.
my config_bert_chinese.json is like this, just a copy from https://huggingface.co/ckiplab/bert-base-chinese/blob/main/config.json, and addingg fusion_layer and encoder_width. { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 21128, "fusion_layer": 6, "encoder_width": 768 }
How many GPUs are you using?
How many GPUs are you using?
i also use 8 gpus. and today i use the same configs and dataset to run ALBEF, the loss looks normal now.
Thanks. May I know which dataset are you using? In this case, I need to test it on my machine to reproduce the problem.
Thanks. May I know which dataset are you using? In this case, I need to test it on my machine to reproduce the problem.
sorry, the data is selected by myself, mainly from '抖音' app. the image-text pairs contain the videos' title and cover.
Thanks. I will try to collect some data with chinese text to figure out the problem. Will let you know ASAP.
Thanks. May I know which dataset are you using? In this case, I need to test it on my machine to reproduce the problem.
sorry, the data is selected by myself, mainly from '抖音' app. the image-text pairs contain the videos' title and cover.
Or i can upload a portion of data to google drive ? is it convenient for you?
Thanks. May I know which dataset are you using? In this case, I need to test it on my machine to reproduce the problem.
sorry, the data is selected by myself, mainly from '抖音' app. the image-text pairs contain the videos' title and cover.
Or i can upload a portion of data to google drive ? is it convenient for you?
Sure, I appreciate it. You can email the data link to jinyu.yang@mavs.uta.edu
Thanks. May I know which dataset are you using? In this case, I need to test it on my machine to reproduce the problem.
sorry, the data is selected by myself, mainly from '抖音' app. the image-text pairs contain the videos' title and cover.
Or i can upload a portion of data to google drive ? is it convenient for you?
Sure, I appreciate it. You can email the data link to jinyu.yang@mavs.uta.edu
ok, i have alreadly sent the data to you ~
Hi, the reason of this nan loss is that your dataset contains empty captions, for example one data sample in your dataset is {'caption': '', 'image': '6912609964733271309_00001.jpg'}
. Removing such invalid samples solves this problem.
You can use the following code to remove invalid samples from the json file.
json_path = 'data.json'
new_json_path = 'data_new.json'
f = open(json_path)
data = json.load(f)
new_data = []
for i in range(len(data)):
if len(data[i]['caption'].strip()) != 0:
new_data.append(data[i])
f.close()
with open(new_json_path, 'w') as jsonfile:
json.dump(new_data, jsonfile, ensure_ascii=False)
Feel free to let me know if you might need additional information. Thanks.
Hi, the reason of this nan loss is that your dataset contains empty captions, for example one data sample in your dataset is
{'caption': '', 'image': '6912609964733271309_00001.jpg'}
. Removing such invalid samples solves this problem. You can use the following code to remove invalid samples from the json file.json_path = 'data.json' new_json_path = 'data_new.json' f = open(json_path) data = json.load(f) new_data = [] for i in range(len(data)): if len(data[i]['caption'].strip()) != 0: new_data.append(data[i]) f.close() with open(new_json_path, 'w') as jsonfile: json.dump(new_data, jsonfile, ensure_ascii=False)
Feel free to let me know if you might need additional information. Thanks.
Thank you very much, it is my mistake.
Hi, the reason of this nan loss is that your dataset contains empty captions, for example one data sample in your dataset is
{'caption': '', 'image': '6912609964733271309_00001.jpg'}
. Removing such invalid samples solves this problem. You can use the following code to remove invalid samples from the json file.json_path = 'data.json' new_json_path = 'data_new.json' f = open(json_path) data = json.load(f) new_data = [] for i in range(len(data)): if len(data[i]['caption'].strip()) != 0: new_data.append(data[i]) f.close() with open(new_json_path, 'w') as jsonfile: json.dump(new_data, jsonfile, ensure_ascii=False)
Feel free to let me know if you might need additional information. Thanks.
I was stuck at the same error and this helped me out! Thanks for @viyjy and @liangzimei too 😄
hi, thanks for your excellent work firstly. when i train my own chinese dataset (so i change the bert-base-uncased to bert-base-chinese), loss becomes nan after several iterations. i have tried to decrease the lr and add grad_clip, but the problem still exists. here is my training config:
can you give me some suggestion? thanks in advance.