ltgoslo / axolotl24_shared_task

AXOLOTL-24 (Ascertain and eXplain Overhauls of the Lexicon Over Time at LChange'24): a shared task
https://github.com/ltgoslo/axolotl24_shared_task
GNU General Public License v3.0
7 stars 4 forks source link

AXOLOTL-24 Shared Task on Explainable Semantic Change Modeling

AXOLOTL-24 stands for "Ascertain and eXplain Overhauls of the Lexicon Over Time at LChange'24"

It is a shared task in explainable semantic change modeling, collocated with the 5th International Workshop on Computational Approaches to Historical Language Change 2024 (LChange'24). This GitHub repository serves as the main information hub for AXOLOTL. The test phase has finished now. The leaderboards are published.

See the shared task description paper

Axolotl24

If you are interested in this shared task, please also join our low-volume mailing list on Google Groups.


Timeline

Organizers

Introduction

This shared task builds on the existing tradition of competitions in diachronic semantic change detection, like (Schlechtweg et al 2020) and many others. However, this time we focus on explaining diachronic semantic changes, even if on a very basic level (for now).

In particular, we challenge the participants to implement a semantic change modeling system which, given two historical corpora and a sense inventory corresponding to one of the periods, is able to:

  1. Find the target word usages associated with new, gained senses (while at the same time correctly identyfing usages associated with the previously existing senses);
  2. Describe the new senses in a way that facilitates understanding and lexicographical research.

Thus, the task is to identify which exact senses were gained between two time periods and generate reasonable descriptions (definitions) of these senses.

To be able to use high-quality gold data, we use a simplified setup where instead of asking the participants to retrieve and analyze all target word usages in raw corpora, we provide two manually checked sets of usage examples (still of considerable size). Below, we still call them "corpora", for clarity

The shared task features data from Finnish and Russian languages, but you do not have to speak these languages to participate. There will also be a surprise language of lesser size at the test stage. For all these languages, we are using gold, manually annotated data to evaluate the predictions of the participant systems.

The shared task consists of two subtasks. The participants are welcome to choose one of them or both, at their will.

Subtask 1. Bridging diachronic word uses and a synchronic dictionary

Codalab competition for Subtask 1 - development submissions

Codalab competition for Subtask 1 - post-evaluation test submissions

The participants are offered two corpora, belonging to different time periods. In addition to this, they are provided with a set of dictionary entries (sense inventories) for the target words describing their senses in the first time period (accompanied by definitions). The task is to find usages of the target words belonging to newly gained senses, i.e., senses not covered by the provided sense inventory, as well as usages belonging to the previously existing senses.

The assumption is that sense definitions from the dictionary, even though not always covering all word senses even from the same time period, may still be a useful additional source of information. The goal is to map word usages to the dictionary senses. This is very similar to Word Sense Disambiguation, with the difference being that the usages corresponding to word senses absent from the dictionary should be grouped into novel sense clusters (this is more similar to Word Sense Induction). In a way, this subtask is a mixture of WSD and WSI.

Subtask 2. Definition generation for novel word senses.

Codalab competition for Subtask 2 - development submissions

Codalab competition for Subtask 2 - post-evaluation test submissions

This subtask challenges the participants to submit good descriptions/definitions for the novel senses they found in subtask 1. The definitions can be generated from scratch or retrieved from existing ontologies: this is completely up to the participants. The organizers will map the predicted definitions to the gold standard ones and evaluate their quality with the standard NLG metrics.

References

  1. Diachronic word embeddings and semantic shifts: a survey (Kutuzov et al., COLING 2018)
  2. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection (Schlechtweg et al., SemEval 2020)
  3. Computational approaches to semantic change (Tahmasebi et al., LangSci Press 2021)
  4. Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word Embeddings (Mickus et al., SemEval 2022)
  5. Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (Giulianelli et al., ACL 2023)

Citation

@inproceedings{fedorova-etal-2024-axolotl24,
    title = "{AXOLOTL}{'}24 Shared Task on Multilingual Explainable Semantic Change Modeling",
    author = "Fedorova, Mariia  and
      Mickus, Timothee  and
      Partanen, Niko  and
      Siewert, Janine  and
      Spaziani, Elena  and
      Kutuzov, Andrey",
    editor = "Tahmasebi, Nina  and
      Montariol, Syrielle  and
      Kutuzov, Andrey  and
      Alfter, David  and
      Periti, Francesco  and
      Cassotti, Pierluigi  and
      Huebscher, Netta",
    booktitle = "Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.lchange-1.8",
    pages = "72--91",
    abstract = "",
}

5th International Workshop on Computational Approaches to Historical Language Change 2024 (LChange'24), August 15, 2024, Bangkok, Thailand