Some details about data

jerbarnes / semeval22_structured_sentiment

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

75 stars 42 forks source link

Some details about data #22

Closed luxinyu1 closed 2 years ago

luxinyu1 commented 2 years ago

In the monolingual subsection in README, there's a sentence:

This track assumes that you train and test on the same language.

Does this mean we can use extra training data in the same language other than the given training set? For example, can we use MPQA+Opener_en training set (or some other mined data in English) to train models and test on MPQA?

jerbarnes commented 2 years ago

Yes, in the monolingual subtrack you can use any combination of resources, including the other structured sentiment datasets. We only require that you document and cite these extra resources in detail.

luxinyu1 commented 2 years ago

We got it, thanks for your reply.

janpf commented 2 years ago

@jerbarnes

Yes, in the monolingual subtrack you can use any combination of resources, including the other structured sentiment datasets.

... those which are in other languages as well? This would be contrary to

This track assumes that you train and test on the same language.

Thanks :)

jerbarnes commented 2 years ago

When we introduced the monolingual subtrack, we had not thought about people using multi-lingual LMs in this track. The wording therefore sounds a bit more restrictive than intended. Sorry for that and I hope our leniency in this respect has not caused you any large problems...