seraphinatarrant / embedding_bias

Repo for project on the geometry of Word Embeddings and how it influences bias downstream
4 stars 2 forks source link

Identify Training Data #3

Closed seraphinatarrant closed 4 years ago

seraphinatarrant commented 4 years ago

Wikipedia dataset for word embeddings

This should be a group decision as well for the two people who work on English, and can be decided independently for the other language. This is extremely flexible.

pandyamugdha commented 4 years ago

We will need to train a model for toxicology so we need to find a dataset for that. Coreference resolution most likely has some provision for pretrained embeddings so we only need to add a layer.

Not sure if I have got this right. Ask @seraphinatarrant