muhaochen / bilingual_dictionaries

This repository contains the source code and links to some datasets used in the CoNLL 2019 paper "Learning to Represent Bilingual Dictionaries".
https://www.aclweb.org/anthology/K19-1015.pdf
10 stars 1 forks source link

ASK for a more detailed reproduction steps, TOO #2

Closed LeeSureman closed 4 years ago

LeeSureman commented 4 years ago

Nice work, thank you for your contribution to NLP. However, I met some problems when I try to reproduce your work. I assume joint/scripts/run_joint_prep.sh is the script to train your model, and I wonder:

  1. where can I get withctx files? I don't find them in https://github.com/swj0419/bilingual_dict_embeddings. I only get the withctx files for en and fr from https://github.com/swj0419/bilingual_dict_embeddings.

  2. what is the en_mono, fr_mono, en_multi, fr_multi in run_joint_train.sh ? I cannot find files with the same name in https://drive.google.com/drive/u/0/folders/1Lm6Q5BxeU0ByR6DZcNfbWpntumiIKhYN

muhaochen commented 4 years ago

@alantian Can you answer our colleague's questions regarding the joint training part?

LeeSureman commented 4 years ago

@alantian And I notice I should run 'run_joint_prep.sh' first. But I don't know where is the en_wiki_text_lower.txt, europarl-v7.fr-en.en.tknzd.lower, fr_wiki_text_lower.txt ?

LeeSureman commented 4 years ago

And I wonder why there is 'merge' in your code? From https://stackoverflow.com/questions/56315726/cannot-import-name-merge-from-keras-layers, I find that keras 2.+ doesn't support Merge.

muhaochen commented 4 years ago

@LeeSureman Indeed Merge can be removed. Wait a bit for @alantian to upload the files though. Thanks.

LeeSureman commented 4 years ago

Did he finish uploading files? How can I get those files?

muhaochen commented 4 years ago

He will update the readme after uploading files, though he is a bit busy thesedays. Are you pressed by any deadlines?

LeeSureman commented 4 years ago

Yes, now I drop the mono and multi word embedding loss in your paper. I just use the MT loss, but I can't find where the four following files are: --lang0_emb_file withctx.en-es.en.50.1.txt \ --lang1_emb_file withctx.en-es.es.50.1.txt \ --lang0_ctxemb_file withctx.en-es.en.50.1.txt.ctx \ --lang1_ctxemb_file withctx.en-es.es.50.1.txt.ctx \

we want to follow your work

alantian commented 4 years ago

Hey, @LeeSureman @muhaochen

I would like to let you know that necessary files has been upload to our Google Drive folder. Note that all gzipped files need to be decompressed after downloading --- for example, on a linux much this can be done by running gzip -d *.gz.

Furthermore, it is expected to execute run_joint_prep.sh before run_joint_train.sh. Files you've mentioned (like en_mono, fr_mono, en_multi, fr_multi) are the products from the first step.

README.md has also been updated accordingly to reflect these changes.

muhaochen commented 4 years ago

@alantian Thanks man!

LeeSureman commented 4 years ago

NEED the code of expertiments of section 4.2 PLEASE 。 We need the code for training on monolingual dataset and test on cross-lingual dataset by aligning words. Please