Closed taghreed34 closed 2 years ago
I don't think that it causes any problems. (But in case it does, feel free to let us know that!)
I changed num of epochs to 2 and max_seq_len to 100. Training started and continued to batch 57 then stopped with the assertion error in the image. When I tried a max_seq_len less than 100, training didn't start due to a cuda error.
Note that I'm working on NER and I'm not using conll2003, I use another dataset with the exact same format, and I think the only difference is it has mention lengths greater than original conll entities.
I changed max_seq_length in this script: luke/examples/legacy/ner/main.py
By default GPU training just gives random error messages, and that makes it hard to detect the problem.
Could you run the same training run on CPU or set the environment variable like export CUDA_LAUNCH_BLOCKING=1
?
I think this will show where the problem is.
@Ryou0634 This is the error appeared when I ran the script on a cpu.
Tracing what this line depends on, I found out that the function used to create features in legacy/ner/utils.py produces some entity_start_positions that are inconsistent with the specified max_sequence_length, but I haven't figured out why this happens yet. the whole context length every time is calculated correctly and is constrained with max_sequence_length, but it's not clear yet how some entity_start_positions are larger than max_sequence_length. I'm debugging it now, but kindly if this behaviour is expected for some obvious reason could you tell me what it is?
The issue is: according to "convert_examples_to_features function", left and right contexts are always calculated regardless of checking the original sentence length first as it might be itself => the max_sequence_length.
Does reducing max_sequence_length before finetuning conflict with anything that might cause a problem? not in terms of model performance