studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 102 forks source link

Reducing max_seq_length #151

Closed taghreed34 closed 2 years ago

taghreed34 commented 2 years ago

Does reducing max_sequence_length before finetuning conflict with anything that might cause a problem? not in terms of model performance

ryokan0123 commented 2 years ago

I don't think that it causes any problems. (But in case it does, feel free to let us know that!)

taghreed34 commented 2 years ago

Capture I changed num of epochs to 2 and max_seq_len to 100. Training started and continued to batch 57 then stopped with the assertion error in the image. When I tried a max_seq_len less than 100, training didn't start due to a cuda error.

taghreed34 commented 2 years ago

Note that I'm working on NER and I'm not using conll2003, I use another dataset with the exact same format, and I think the only difference is it has mention lengths greater than original conll entities.

taghreed34 commented 2 years ago

I changed max_seq_length in this script: luke/examples/legacy/ner/main.py

ryokan0123 commented 2 years ago

By default GPU training just gives random error messages, and that makes it hard to detect the problem. Could you run the same training run on CPU or set the environment variable like export CUDA_LAUNCH_BLOCKING=1? I think this will show where the problem is.

taghreed34 commented 2 years ago

cpu_error @Ryou0634 This is the error appeared when I ran the script on a cpu.

taghreed34 commented 2 years ago

start_states = torch.gather(word_hidden_states, -2, entity_start_positions)

Tracing what this line depends on, I found out that the function used to create features in legacy/ner/utils.py produces some entity_start_positions that are inconsistent with the specified max_sequence_length, but I haven't figured out why this happens yet. the whole context length every time is calculated correctly and is constrained with max_sequence_length, but it's not clear yet how some entity_start_positions are larger than max_sequence_length. I'm debugging it now, but kindly if this behaviour is expected for some obvious reason could you tell me what it is?

taghreed34 commented 2 years ago

The issue is: according to "convert_examples_to_features function", left and right contexts are always calculated regardless of checking the original sentence length first as it might be itself => the max_sequence_length.