liuwei1206 / LEBERT

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
341 stars 60 forks source link

Question about the size [N, L, W, D] #64

Closed JubSteven closed 11 months ago

JubSteven commented 11 months ago

Hi, thanks for your great work! I want to ask a question about the size of the tensor during the pipeline. According to the paper, for a sentence (or a phase) like "美国人民", it is first segmented into words like "美国", "美国人", "国人", etc. For each segmented word, we sort of add the word vector to each corresponding character of the word.

But in this way, how do we make sure that the dimension W is fixed? For example, the word "美" might have two words "美国", "美国人" accompanied with it, while "民" only has one word "人民".

Thanks for your patience!

huskydoge commented 11 months ago

same issue

liuwei1206 commented 11 months ago

Hi,

I am sorry to say that I kind of forgot the details. In my memory, the W seems to be a hyper-parameter. If a character has a matched word number less than W, then we can use padding.

Hope it helps.

JubSteven commented 11 months ago

Ok, thanks!