peleiden / daluke

A Danish-speaking language model with entity-aware self-attention
MIT License
9 stars 0 forks source link

Adding it to the DaCy robustness test #105

Open KennethEnevoldsen opened 3 years ago

KennethEnevoldsen commented 3 years ago

Hi @sorenmulli and @asgerius really enjoyed reading your work. There are some great points to be taken from it. For example, inspired by your model evaluation we will try to add plank and wikiANN to the robustness tests as out of distribution datasets. I am also interested in how retraining this model in a multitask setting would perform 🧐

I saw your issue on adding daluke to the DaNLP benchmark and would very much like to add daluke to the bias and robustness evaluation in DaCy framework. Feel free to reach out if this is something you are interested in.

PS. the medium dacy model is actually the Danish BERT by botxo, not the multilingual Roberta ;)

sorenmulli commented 3 years ago

Hello! Thank you very much for your interest in the DaLUKE project - we will be continue development and analysis of the model and are thus very excited to see this evaluation method.

To make robustness tests run, it seems you guys need to make an apply function for DaLUKE which seems to require us adding the same API capability as described in #103. Furthermore, we found a bug in pre-training with omission of [SEP] token and are in the process of rerunning it. We will let you know when these efforts are done which might be impacted a bit by summer holiday ;-) Let us know if more is needed and if you have any questions to the methods such as using Plank and WikiANN.

And; sorry for misrepresenting DaCy medium - we only discovered your awesome project late in the bachelor and it seems that we were a bit superficial in our reading. We are back from vacations in August and would very much like to hear your thoughts on further work on this project such as multitask learning + general Danish NER input, so, if you are interested, we'll shoot you a mail about a possible meeting.

KennethEnevoldsen commented 3 years ago

exactly if you reach out when you have finished #103 then I will make sure to add it (or if you want to be a contributor to DaCy feel free to open a PR).

No problem, I understand that a bachelor thesis can hard-pressed for time.

Feel free to shoot me a mail at kenneth.enevoldsen (at) cas.au.dk!