lxucs / coref-hoi

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.
Apache License 2.0
59 stars 19 forks source link

Preprocess - Split into segments function #10

Closed AradAshrafi closed 3 years ago

AradAshrafi commented 3 years ago

Hi again Liyan,

I had some brief questions regarding splitting documents into segments. I think the segments contain more than one sentence (based on the split_into_segments function in the preprocess.py file). Was not it be better if segments contain one sentence at last? I could not see the intuition behind it. Is it better to have longer segments or it is for having more efficient use of resources? or Is it practically tested and the trained model gained better accuracy this way?

Thanks, Arad

lxucs commented 3 years ago

Each segment would have multiple English sentences because the representation can then be contextualized. Not sure If I understand the question correctly?

AradAshrafi commented 3 years ago

Many thanks. Yes, that was my question.