Question on FastText baseline in the experiment

jej127 commented 1 year ago

Hello! I have a question about the detail of FastText baseline in the Table 2 ~ 4. For this baseline, as handling OOV words, we have two choices: 1) Representing OOV words using null vectors 2) Computing vectors for OOV words by summing the n-gram vectors.

In the context of Bojanowski et al. (2017)[1], the first option corresponds to the "sisg-" setting, while the second aligns with the "sisg" setting. Could you please specify which option was utilized in your experiments? My conjecture leans towards the option 1) because the option 2) doesn't seem to follow a mimick-like model. Nonetheless, I would greatly appreciate your guidance on this matter. Thank you in advance for your help!

[1] Bojanowski et al., Enriching Word Vectors with Subword Information, TACL 2017.

tigerchen52 commented 1 year ago

Hi,

Thanks for asking.

We used n-gram vectors to impute representations for OOV words for the FastText model. Also, FastText embeddings can be the target teacher for other mimick-like models.

jej127 commented 1 year ago

Thanks for assistance. It is really helpful.

tigerchen52 / LOVE

Question on FastText baseline in the experiment #8