megagonlabs / ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"
Apache License 2.0
256 stars 88 forks source link

Summarization sometimes removes attribute names between [COL] and [VAL] #8

Open magiclogy- opened 3 years ago

magiclogy- commented 3 years ago

When one attribute name appears in the token sequence, it may be removed by the summarization componenent.

For example, in the Line 719 in data/er_magellan/Structured/Amazon-Google/test.txt.su, manufacturer between [COL] and [VAL] is removed because of manufacturer in title.

Since a value without an attribute name seems a bit unnatural, I'm not sure whether it's a bug.