iPieter / RobBERT

A Dutch RoBERTa-based language model
https://pieter.ai/robbert/
MIT License
196 stars 29 forks source link

Questions on semantic similarity #17

Closed Tauvic closed 3 years ago

Tauvic commented 3 years ago

Im working on a dutch bible project and therefore interested in semantic similarity. Are there any plans to make a sentence similarity model. The only models I found that support semantic similarity in dutch are multi lingual models.

My plan for now is:

Im also looking for datasets to train a model on that. The Bertje model also does not have a model trained on sentence similarity Any suggestions that can help me?

twinters commented 3 years ago

Hi Tauvic,

We have not trained a sentence similarity model ourselves. However, existing implementations for finetuning such a sentence similarity using RoBERTa should also be easily transferable to RobBERT by just loading in our model instead, and providing your own Dutch dataset. You plan sounds like the right plan to achieve this with RobBERT if you somehow find similarity scores for them. You could potentially already get started by (automatically) translating existing sentence similarity corpora if none exist for Dutch.

Hope this helps, and good luck on your project!

Tauvic commented 3 years ago

Thanks for the response. I will look into it. But for now i have switched to another project on driving safety: https://github.com/Tauvic/DriverAwareness