megagonlabs / ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"
Apache License 2.0
256 stars 88 forks source link

drop_col gives error? #24

Open utsgr opened 2 years ago

utsgr commented 2 years ago

When I try to use data augmentation with drop_col, I get the error below. I did not change anything about the model or data, is there something I'm missing?

Screen Shot 2021-11-29 at 5 38 14 PM
soodeh-nilforoushan commented 2 years ago

When I try to use data augmentation with drop_col, I get the error below. I did not change anything about the model or data, is there something I'm missing?

Screen Shot 2021-11-29 at 5 38 14 PM

did you solve the problem? I got the same error

utsgr commented 2 years ago

Sadly no, I'm hoping someone can help.

soodeh-nilforoushan commented 2 years ago

Sadly no, I'm hoping someone can help.

They added the ditto light on their code, and after this changes, this error happens. I hope authors answer this question

progsi commented 1 year ago

Problem appears to be that they are using \t instead of [SEP]. Also, the variable combined is split as well and by just ignoring this line of code it seems to work for me.

rinkstiekema commented 1 year ago

I do not think the issue is caused by using \t instead of [SEP] as that is also used in the other data augmentation functions. I noticed a different pattern; if the column before the separator is the one to be removed, it also removes the [SEP] indicator from the resulting tokens.

I'll attempt to fix the algorithm for dropping columns, I'll keep you posted.