microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Assertion Error in convert_examples_to_features function #188

Open SibtainRazaJamali opened 4 years ago

SibtainRazaJamali commented 4 years ago

I am getting error For all sequences i have len(label_ids)==512 but for 1 example i am getting a length of 513. Assertion Error, assert(len(label_ids)==max_seq_length) What is the reason behind this error?

hasansalimkanmaz commented 4 years ago

Did you solve the problem?

r000bin commented 4 years ago

I had an error in layoutlm/data/funsd.py on the line: assert splits[0] == bsplits[0]

The problem was after the following split because some samples had actually a '\T' in it: splits = line.split("\t") bsplits = bline.split("\t")

after replacing '\T' in the text the problem was gone. Maybe it's somethin similar here.

ssherlins commented 3 years ago

Hi @r000bin, I encountered the same error. In the solution you provided, did you mean that you replaced '\T' in the train.txt file?