uhh-lt / poincare

Using sense representations to improve detection of compositional nominal compounds
0 stars 0 forks source link

SenseGram on the new 100 dimensional #14

Open alexanderpanchenko opened 6 years ago

alexanderpanchenko commented 6 years ago

1) Generate SenseGram models from 100 and 300 dimensinal word2vec embeddings generated from the ukWaC corpus. Use the uwac_2_cbow_100.text.model first.

2) Re-compute the unsupervised results based on it.

3) Share the new SenseGram models.

(root) panchenko@ltgpu2:/srv/home/puzyrev/embeddings$ du -sh *
83M ukwac_2_cbow_100.text.model
533M    ukwac_2_cbow_100.text.model.vectors.npy
83M ukwac_cbow_100.text.model
532M    ukwac_cbow_100.text.model.vectors.npy
1.6G    ukwac_cbow_100.text.vector
83M ukwac_cbow_300.text.model
1.6G    ukwac_cbow_300.text.model.vectors.npy
4.8G    ukwac_cbow_300.text.vector
ajana1989 commented 6 years ago

How to get this file from my account (ltcpu1)? Do I have the permission?

alexanderpanchenko commented 6 years ago

hello, yes you should have access. let me know if this is not the case. @animeshmukh wanted to call you to explain the details.

adhaesitadimo commented 6 years ago

ukwac_2_cbow_100.text.model ukwac_2_cbow_100.text.model.vectors.npy ukwac_2_cbow_100.text.vector

Corrected models for 100 dimensions. Missing only 3 compounds instead 52 compounds in the first model.

ajana1989 commented 6 years ago

I have put the sensegram embeddings obtained from 100 dimension ukwac vector in jana@ltgpu2:/srv/home/jana/sensegram_embeddings

alexanderpanchenko commented 6 years ago

Thanks! @ajana1989 please make sure that the rights on these files allow reading by other users (chmod ...).

@dimitriusseveruscensor please check the files and use with the pretrained on word embeddings KR model to improve the results.