Semantic similarity - Githubissues

Tauvic commented 3 years ago

Im working on a dutch bible project and therefore interested in semantic similarity. The only models I found that support semantic similarity in dutch are multi lingual models.

sentence-transformers/xlm-r-100langs-bert-base-nli-stsb-mean-tokens
sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking

My plan for now is:

Find some model that supports Dutch
Train it on sentence similarity (how, where to get a decent dataset)
There are some parallel bible translations that can be used as a start but there are no similarity scores
Evaluate the results

Are there any plans to make a sentence similarity model with Bertje. Im also looking for datasets to train a model on that. The ROBBERT model also does not have a model trained on sentence similarity Any suggestions that can help me?

wietsedv commented 3 years ago

That sure seems like an interesting project, but I cannot really help you with it. These models would have to specifically trained. I would recommend seeing what you can do with the XLM-RoBERTa based model.

Tauvic commented 3 years ago

Thank for the reply. I learned that sentence similarity indeed requirers some special training. But thats a bit out of my leage now. I have to find some other solution.

And now im working on a driving safety project: https://github.com/Tauvic/DriverAwareness

wietsedv / bertje

Semantic similarity #14