Open connormeaton opened 4 years ago
Per the docs and your comment, the execution command should be:
$ python core.py --model Lee-Dernoncourt
--dataset SwDA <path_to_SwDA_dataset_directory>
--embedding GloVe <path_to_GloVe_embedding_file>
So, instead of:
--embedding word_embedding/word2vec/GoogleNews-vectors-negative300.bin
You should be using:
--embedding GloVe word_embedding/word2vec/GoogleNews-vectors-negative300.bin
Otherwise you will cause the script to barf here when indexing into the emeddings
dict:
https://github.com/ilimugur/short-text-classification/blob/77306e3900c1beaf093ed0b515bbf0cc68232861/core.py#L133
That said, you'll notice the emeddings
dict has all but 'FastText'
commented out:
https://github.com/ilimugur/short-text-classification/blob/77306e3900c1beaf093ed0b515bbf0cc68232861/core.py#L17-L21
Unclear if there is an issue with the other emeddings or if they were accidentally commented out, but I'd suggest starting with the yet-uncommented embedding in case:
$ python core.py
--model Lee-Dernoncourt
--dataset SwDA swda/data
--embedding FastText word_embedding/word2vec/GoogleNews-vectors-negative300.bin
I have the same issue. Seems like the code only supports FastText. So I downloaded the word embedddings for FastText and tried the following command: python core.py --model Lee-Dernoncourt --dataset SwDA swda/ --embedding FastText Then I get an error for providing --source-language. Then tried: !python core.py --model Lee-Dernoncourt --dataset SwDA swda/ --embedding FastText --source-language en But get the following error: error: argument --source-language: expected 3 arguments which I don't know how to handle at the moment. Let me know if you have any updates!
@cmeaton here's a quick update: I was able to run the model with no error using the following command, however the accuracy is extremely low! So probably not worth investing into figuring out the code. python core.py --model Lee-Dernoncourt --dataset SwDA swda/swda/ --embedding FastText --source-language en cc.en.300.vec None
@boqrat Thanks for your comments! That's great you were able to get it running, but bummer on low accuracy, thanks for the heads up. I've since moved on to other things, but if I take another crack at this I'll let you know my progress.
First off, thanks for code, this is really great work.
I am having trouble training the model with the command example you provided. I am using this command to train the model:
$ python core.py --model Lee-Dernoncourt --dataset SwDA
--embedding GloVe
I am replacing the paths to reflect where I unzipped/stored the swda dataset / glove and word2vec embeddings, which looks as below:
$ python core.py --model Lee-Dernoncourt --dataset SwDA swda/data --embedding word_embedding/word2vec/GoogleNews-vectors-negative300.bin
Inside of swda/data contains subdirectories 'sw00ut', 'sw01utt', and so on. Running this command yields the following error:
_KeyError: '/wordembedding/word2vec/GoogleNews-vectors-negative300.bin'
If I change the command to:
$ python core.py --model Lee-Dernoncourt --dataset SwDA swda/data --embedding word2vec word_embedding/word2vec/GoogleNews-vectors-negative300.bin
Then I get this error:
_core.py: error: unrecognized arguments: /wordembedding/word2vec/GoogleNews-vectors-negative300.bin
If you have any ideas on how to proceed, please advise. Thank you very much,
Best, Connor