Open SharmisthaJat opened 6 years ago
Which script are you trying to run?
Hi, Following are my steps for execution:
1) Run setup.sh to get data 2) Making a small vocab.txt file in folder 'grounding-embeddings/causal' 3) Running grounding-embeddings/causal/Makefile command: make all
Is this correct ?
Ah, the casual folder is still under development. Its contents weren't the focus of our paper, but rather a "casual" exploration of future research directions. Let me know if there is something specific that you're looking to do and I'll try to help you out.
I was trying to replicate your paper's results for feature fit. Which scripts should I run to get results similar to the one shown in Table 3,4, Figure 1 etc. ? (http://aclweb.org/anthology/W17-2810).
Ok! Feature fit scores are calculated by feature_fit.py. You will want to edit "PIVOT" to the word representation you want to use and and "SOURCE" to "mcrae" or "cslb".
Hi,
Thanks, in feature_fit.py again there is a vocab.txt required from each of the vector representation folder. Eg. Line 52 in feature_fit.py, is that the vocab file, what should this file be?
Hi Lucy,
Can you please clarify the input required for Line 52 in feature_fit.py (vocab.txt).
Hi, I'm still in the process of finding this file. I did a bit of cleaning on my personal computer so I do not have it on here. Once I find it, I will let you know. It should contain words and their frequencies, and it seems like it's usually created when someone runs the original GloVe C scripts with a certain flag on a custom dataset, so the public GloVe download does not include this file. I believe my co-author created this file and so I will check with him to see if he has more information. I think this file is just used to load GloVe word embeddings using word2vec tools, and though there are alternative methods for turning GloVe formatting into word2vec formatting, I will try to find this file soon.
Thank you for your patience.
Thanks, I appreciate your help :)
Could you try removing the fvocab input from load_all_embeddings? You don't actually need it to reproduce production results. I am updating the code and the readme with more information on how to set things up.
Do let me know if you have any other questions.
Hi,
I updated the glove embeddings read the code to text file reading and tried running subgraphs/feature_fit.py. But, the code breaks at line 940 with following output:
File "feature_fit.py", line 940, in main clfs = pickle.load(clf_f) EOFError: Ran out of input
the classifier pickle has an issue.
I managed to run feature_fit.py all the way through (after cloning this repo, downloading the data from scratch, following the ReadMe, etc). Are you using Python 3?
Update: I was using python3.5 to be precise. I reran the file with Python3 and now the error is gensim related
File "feature_fit.py", line 410, in analyze_classifiers all_embeddings.init_sims()
This maybe due to me loading the glove directly, not using gensim. Let me try converting txt to bin and load it with gensim with this repo https://github.com/marekrei/convertvec
Hmmm so I am using the code currently in the repo (with KeyedVectors.load_word2vec_format(INPUT, binary=False)). You should make sure the top of your GloVe input has the extra line indicated in the ReadMe. I think I also used the latest version of gensim and other packages when I ran it.
Oh, I see, you have updated the code. I have been playing with the old one. Let me update and check.
Hi Lucy,
It worked :), thanks for all the help and the interesting paper.
Best, Sharmistha
Hi,
Thanks for sharing your code. I was not sure what format the vocab.txt should be, as the file was not in the repo, so I tested the code with a vocab.txt with single words on each line. Example:
yellow woods hanging regularize
But, even with a very small vocab file of 53 words, the script ends up using too much memory (order of 200GB RAM) and gets killed in some time during, '$(datadir)/cooccurrence.filtered.bin: filter_glove' step in the Makefile. Am I running the script with the right input? (there are no other errors reported)