Need to match label length to tokenized ids in Norec(Dataset).
The first iteration of this will given each sub-word the original token of the parent, meaning some BIO-sequences may have multiple B'. This could affect training capability, and should be explored further.
Need to match label length to tokenized ids in Norec(Dataset).
The first iteration of this will given each sub-word the original token of the parent, meaning some BIO-sequences may have multiple
B
'. This could affect training capability, and should be explored further.