Question about training data

seraphinatarrant / embedding_bias

Repo for project on the geometry of Word Embeddings and how it influences bias downstream

4 stars 2 forks source link

Question about training data #6

Closed kato8966 closed 1 year ago

kato8966 commented 1 year ago

The paper says "To train embeddings, we use domain-matched data for each downstream task. For coreference we train on Wikipedia data, ..." in section 4.1 Datasets. However, README.md says,

Data

English Coreference:

We pretrain embeddings on the English gigaword corpus.

Which is right?

seraphinatarrant commented 1 year ago

Oh sorry, the readme was out of date from our initial experiments, good catch! We use wikipedia as the paper says. I have updated and corrected the readme.