wietsedv / bertje

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models"
https://aclanthology.org/2020.findings-emnlp.389/
Apache License 2.0
133 stars 10 forks source link

Semantic similarity #14

Closed Tauvic closed 3 years ago

Tauvic commented 3 years ago

Im working on a dutch bible project and therefore interested in semantic similarity. The only models I found that support semantic similarity in dutch are multi lingual models.

My plan for now is:

Are there any plans to make a sentence similarity model with Bertje. Im also looking for datasets to train a model on that. The ROBBERT model also does not have a model trained on sentence similarity Any suggestions that can help me?

wietsedv commented 3 years ago

That sure seems like an interesting project, but I cannot really help you with it. These models would have to specifically trained. I would recommend seeing what you can do with the XLM-RoBERTa based model.

Tauvic commented 3 years ago

Thank for the reply. I learned that sentence similarity indeed requirers some special training. But thats a bit out of my leage now. I have to find some other solution.

And now im working on a driving safety project: https://github.com/Tauvic/DriverAwareness