Closed sushantakpani closed 3 years ago
python run.py train_bert_base_ml0_d1 0 also gave the same result
whereas
python run.py train_spanbert_base_ml0_d1 0 progressed for training but halted due to cuda out of memory issue. RuntimeError: CUDA out of memory. Tried to allocate 630.00 MiB (GPU 5; 10.76 GiB total capacity; 8.44 GiB already allocated; 315.12 MiB free; 9.60 GiB reserved in total by PyTorch)
python run.py train_bert_base_ml0_d1 0 also gave the same result
whereas
python run.py train_spanbert_base_ml0_d1 0 progressed for training but halted due to cuda out of memory issue. RuntimeError: CUDA out of memory. Tried to allocate 630.00 MiB (GPU 5; 10.76 GiB total capacity; 8.44 GiB already allocated; 315.12 MiB free; 9.60 GiB reserved in total by PyTorch)
Hi,
I solve these kinds of issues by changing some parameters in the experiments.conf file, in order to decrease the size of the model. For example, you can decrease the ffnn_size, max_segment_len.
python run.py train_bert_base_ml0_d1 0 also gave the same result whereas python run.py train_spanbert_base_ml0_d1 0 progressed for training but halted due to cuda out of memory issue. RuntimeError: CUDA out of memory. Tried to allocate 630.00 MiB (GPU 5; 10.76 GiB total capacity; 8.44 GiB already allocated; 315.12 MiB free; 9.60 GiB reserved in total by PyTorch)
Hi,
I solve these kinds of issues by changing some parameters in the experiments.conf file, in order to decrease the size of the model. For example, you can decrease the ffnn_size, max_segment_len.
Hi @AradAshrafi, Thanks for your response and tips to solve the Cuda memory issue. I will try that one. Are you able to train bert_base like spanbert_base?
Sushanta
python run.py train_bert_base_ml0_d1 0 also gave the same result whereas python run.py train_spanbert_base_ml0_d1 0 progressed for training but halted due to cuda out of memory issue. RuntimeError: CUDA out of memory. Tried to allocate 630.00 MiB (GPU 5; 10.76 GiB total capacity; 8.44 GiB already allocated; 315.12 MiB free; 9.60 GiB reserved in total by PyTorch)
Hi, I solve these kinds of issues by changing some parameters in the experiments.conf file, in order to decrease the size of the model. For example, you can decrease the ffnn_size, max_segment_len.
Hi @AradAshrafi, Thanks for your response and tips to solve the Cuda memory issue. I will try that one. Are you able to train bert_base like spanbert_base?
Sushanta
Seems working now. I tried python run.py train_spanbert_base_ml0_d1 0 with the following values in the experiments.conf for spanbert_base.
spanbert_base = ${best}{ num_docs = 2802 bert_learning_rate = 2e-05 task_learning_rate = 0.0001 max_segment_len = 128 #384 ffnn_size = 1000 #3000 cluster_ffnn_size = 1000 #3000 max_training_sentences = 3 bert_tokenizer_name = bert-base-cased bert_pretrained_name_or_path = ${best.data_dir}/spanbert_base }
@lxucs Please let me know how can I train bert_base?
Hi @sushantakpani , you can have a config like this (similar to training spanbert_base):
train_bert_base_ml0_d1 = ${train_bert_base}{
mention_loss_coef = 0
coref_depth = 1
}
Hi @lxucs
It seems this error was due to GPU memory issue. I have shifted to a higher memory GPU server and able to run the training. python run.py train_bert_base_ml0_d1 0
My configuration as follows
bert_base = ${best}{ num_docs = 2802 bert_learning_rate = 1e-05 task_learning_rate = 2e-4 max_segment_len = 128 ffnn_size =1000 #3000 cluster_ffnn_size =1000 #3000 max_training_sentences = 11 bert_tokenizer_name = bert-base-cased bert_pretrained_name_or_path = bert-base-cased }
train_bert_base = ${bert_base}{ }
train_bert_base_ml0_d1 = ${train_bert_base}{ mention_loss_coef = 0 coref_depth = 1 }
Hi @sushantakpani , you can have a config like this (similar to training spanbert_base):
train_bert_base_ml0_d1 = ${train_bert_base}{ mention_loss_coef = 0 coref_depth = 1 }
In Experiment Section of the paper:
Note thatBERTandSpanBERTcompletely rely on only local decisions without any HOI. Particularly, +AA is equivalent to Joshi et al. (2020).
Please let me know to replicate Joshi 2020 work what should be the configuration.
Is this configuration fine: higher_order = attended_antecedent
train_spanbert_base_ml0_d1 = ${train_spanbert_base}{ mention_loss_coef = 0 coref_depth = 2 }
In Experiment Section of the paper:
Note thatBERTandSpanBERTcompletely relyon only local decisions without any HOI. Particu-larly,+AAis equivalent to Joshi et al. (2020).
Please let me know to replicate Joshi 2020 work what should be the configuration.
Is this configuration fine: higher_order = attended_antecedent
train_spanbert_base_ml0_d1 = ${train_spanbert_base}{ mention_loss_coef = 0 coref_depth = 2 }
hey, how about your trianing result of bert_base? I have trained the model on bert_base with c2f, but only get a result about 67 F1, and the tensorflow version is about 73 F1.
In Experiment Section of the paper:
Note thatBERTandSpanBERTcompletely rely on only local decisions without any HOI. Particularly, +AA is equivalent to Joshi et al. (2020).
Please let me know to replicate Joshi 2020 work what should be the configuration.
Is this configuration fine: higher_order = attended_antecedent
train_spanbert_base_ml0_d1 = ${train_spanbert_base}{ mention_loss_coef = 0 coref_depth = 2 }
Hi, Have you replicate Joshi et al. spanbert large results?
In Experiment Section of the paper: Note thatBERTandSpanBERTcompletely relyon only local decisions without any HOI. Particu-larly,+AAis equivalent to Joshi et al. (2020). Please let me know to replicate Joshi 2020 work what should be the configuration. Is this configuration fine: higher_order = attended_antecedent train_spanbert_base_ml0_d1 = ${train_spanbert_base}{ mention_loss_coef = 0 coref_depth = 2 }
hey, how about your trianing result of bert_base? I have trained the model on bert_base with c2f, but only get a result about 67 F1, and the tensorflow version is about 73 F1.
For BERT-base I could achieve 73.3 F1
Hi @lxucs,
I want to train a model for bert_base with no HOI like the spanbert_large_ml0_d1 model
python run.py bert_base 0
Got this issue:
Traceback (most recent call last): File "run.py", line 289, in
model = runner.initialize_model()
File "run.py", line 51, in initialize_model
model = CorefModel(self.config, self.device)
File "/VL/space/sushantakp/research_work/coref-hoi/model.py", line 33, in init
self.bert = BertModel.from_pretrained(config['bert_pretrained_name_or_path'])
File "/VL/space/sushantakp/.conda/envs/skp_env376/lib/python3.7/site-packages/transformers/modeling_utils.py", line 935, in from_pretrained
raise EnvironmentError(msg)
OSError: Can't load weights for 'bert-base-cased'. Make sure that:
or 'bert-base-cased' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.
Is it needed to change any parameter in experiments.conf ?
To handle above issue
to train with HOI/ No HOI