Cross-lingual Contextualized Topic Models with Zero-shot Learning

PrinceTitiya commented 2 months ago

Title

Cross-lingual Contextualized Topic Models with Zero-shot Learning.

Team Name

TeamTatakae

Email

202318024@daiict.ac.in

Team Member 1 Name

Mitul Dudhat

Team Member 1 Id

202318024

Team Member 2 Name

Ayush Patel

Team Member 2 Id

202318036

Team Member 3 Name

Prince Titiya

Team Member 3 Id

202318010

Team Member 4 Name

Hiten Gondaliya

Team Member 4 Id

202318063

Problem Statement

The paper introduces a zero-shot cross-lingual topic model that learns topics in English and predicts them for unseen languages like Italian and Portuguese, without needing translations. It overcomes the limitations of traditional bag-of-words models, ensuring the transferred topics remain coherent and stable across languages.

Evaluation Strategy

Matches, Centroid Similarity, and KL Divergence.

Dataset

Dataset Link - https://github.com/vinid/data (Dataset was extracted by DBpedia and they have made it publically available on GITHUB.

Resources

Cross-lingual Contextualized Topic Models with Zero-shot Learning. Research paper link - https://arxiv.org/abs/2004.07737

parth126 commented 2 months ago

Proposal is too vague.
Evaluation needs to be defined better
One option is picking up any language model (LSTM, GRU, Transformer) and implementing from scratch - Andrej Karpathy did this for GPT2 on a single GPU. You can try doing something similar for a smaller language model.

parth126 commented 2 months ago

Suggested coming up with a list of datasets and papers for cross lingual sentiment analysis

PrinceTitiya commented 2 months ago

We have updated the details of the project.

parth126 commented 2 months ago

Looks good. Marking as minor revision. Please explain the metrics a little more, maybe with an example.

PrinceTitiya commented 1 month ago

Our model learns topics on one language (here, English), and predicts them for unseen documents in different languages. We will evaluate the quality of the topic predictions for the same document in different languages (Italian, Portuguese).

Using the below Evaluation Metrics : WhatsApp Image 2024-09-28 at 4 25 10 PM

parth126 commented 1 month ago

Looks good. Marking as approved

parth126 / IT550