timoschick / one-token-approximation

This repository contains the code for applying One-Token Approximation to a pretrained language model using subword-level tokenization.
https://arxiv.org/abs/1904.06707
11 stars 4 forks source link

Example from paper isn't working #2

Open ValeraLobov opened 2 years ago

ValeraLobov commented 2 years ago

Hello!

First of all, thank you for work and especially for Attentive Mimicking mechanism, it demonstrates good results in my research. Unfortunately, I can't reproduce results from paper. I just want to get a better embedding for word 'unicycle' with this simple command like you advised:

python3 ota.py --word unicycle --output_file inference_ota_embeds.txt --model_cls bert --model bert-base-uncased --iterations 4000

But OTA embedding seems to have nothing in common with embeddings for words 'unicycle' and 'bicycle' from original BERT model, cosine similarity score is under 0.1. Information from logs: loss is decreasing to negligibly small value, cosine is always -1.

Please, can you help me? May be I am doing something incorrectly.

Thank you!