megagonlabs / ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"
Apache License 2.0
256 stars 88 forks source link

ditto for spanish #1

Closed CristhianBoujon closed 4 years ago

CristhianBoujon commented 4 years ago

How many changes are required in order to implement for spanish language?

oi02lyl commented 4 years ago

I think you can do so by supporting the multilingual models here: https://huggingface.co/transformers/pretrained_models.html. For example, you can replace all the “distilbert-baes-uncased” with “distilbert-base-multilingual-cased” and see if it works or not.

CristhianBoujon commented 4 years ago

I'm going to try it and comment the results. Thank you!

tchugh commented 3 years ago

@CristhianBoujon Were you able to try the multi-lingual model? Did you also need to make any changes to the input data format? I see that in the WDC dataset, strings are followed by @fr, @es, @es as language tags.