Open dawidm opened 11 months ago
Hi Dawid,
Yes that's how my code works, and thanks for pointing out this issue with the VAST dataset. I encourage you to raise an issue the original repo of this dataset.
If the targets are non-existent in the text, then being labeled as neutral is not a problem. Stance being neutral can happen in both cases: 1) The target is in the text but the author does not show a clear stance towards it; 2) The target is not existent at all.
I hope this addresses your concerns.
Thanks for your answer. I think this may be a little trickier. The targets for neutral samples are wrong, but in a specific way. As far as I understand, they just haven't been changed to a random one in order to make a sample neutral. So they are highly related to post text, making Wikipedia definitions also highly related. In my opinion, this could affect results, making such samples easier to classify as neutral ones.
Could you show some examples?
I mean all samples that have type_idx==4 in VAST train set. What I suppose about making them easier to classify with incorrect definitions is potentially confirmed by improved score for neutral class (~0.925) compared to BERT alone (<0.900) which shouldn't happen when definitions for neutral samples are wrong.
Hello,
If I understand the code correctly, you use 'new_topic' column for getting Wikipedia definitions (but 'topic_str' for classification model). Working with VAST dataset I've noticed, that 'new_topic' values are incorrect for synthetic neutral samples (type_idx=4). This makes target definitions incorrect for most neutral samples. Please correct me if I'm wrong.