If I use Independent and max_segment_len = 128, It divide all documents into 128 tokens. After 128 tokens enter BERT, It will make an embedding for them.
In this situation,
(1) Is it looking for a antecedent only within the 128 tokens?
OR
(2) Put all segments in BERT and get embedding for all tokens.
and after that, Is it looking for a antecedent within all token?
I have a question about a Independent.
If I use
Independent
andmax_segment_len = 128
, It divide all documents into 128 tokens. After 128 tokens enter BERT, It will make an embedding for them.In this situation, (1) Is it looking for a antecedent only within the 128 tokens? OR (2) Put all segments in BERT and get embedding for all tokens. and after that, Is it looking for a antecedent within all token?