I have been trying to run the mBERT extraction script for the dataset : ca/head_first with bert-base-multilingual-cased. I am faced with the following error trace :
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/train.conllu
Saving to data/sent_graphs/ca/head_first/train_bert.hdf5
Embedding...
0%| | 0/1173 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/dev.conllu
Saving to data/sent_graphs/ca/head_first/dev_bert.hdf5
Embedding...
0%| | 0/168 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/test.conllu
Saving to data/sent_graphs/ca/head_first/test_bert.hdf5
Embedding...
0%| | 0/336 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Seems like the output from average_reps function in bert_embed.py is giving an empty output [] for the data : 'Bona ubicació .' When it reaches the assert statement, this output length is clearly not equal to the length of the number of tokens in the sentence. This was an example that I illustrated to explain the problem. Would really appreciate if you could guide me on how to fix this.
I have been trying to run the mBERT extraction script for the dataset : ca/head_first with bert-base-multilingual-cased. I am faced with the following error trace :
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/train.conllu Saving to data/sent_graphs/ca/head_first/train_bert.hdf5 Embedding... 0%| | 0/1173 [00:00<?, ?it/s] Traceback (most recent call last): File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/dev.conllu
Saving to data/sent_graphs/ca/head_first/dev_bert.hdf5
Embedding...
0%| | 0/168 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/test.conllu
Saving to data/sent_graphs/ca/head_first/test_bert.hdf5
Embedding...
0%| | 0/336 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Seems like the output from average_reps function in bert_embed.py is giving an empty output [] for the data : 'Bona ubicació .' When it reaches the assert statement, this output length is clearly not equal to the length of the number of tokens in the sentence. This was an example that I illustrated to explain the problem. Would really appreciate if you could guide me on how to fix this.