Preprocess - Split into segments function

AradAshrafi commented 3 years ago

Hi again Liyan,

I had some brief questions regarding splitting documents into segments. I think the segments contain more than one sentence (based on the split_into_segments function in the preprocess.py file). Was not it be better if segments contain one sentence at last? I could not see the intuition behind it. Is it better to have longer segments or it is for having more efficient use of resources? or Is it practically tested and the trained model gained better accuracy this way?

Thanks, Arad

lxucs commented 3 years ago

Each segment would have multiple English sentences because the representation can then be contextualized. Not sure If I understand the question correctly?

AradAshrafi commented 3 years ago

Many thanks. Yes, that was my question.

lxucs / coref-hoi

Preprocess - Split into segments function #10