Open menshikh-iv opened 7 years ago
@menshikh-iv Sounds really useful. I am interested to work on this issue.
Other potentially useful evaluations of word embeddings (along with code) can be found here - https://github.com/mfaruqui/eval-word-vectors
@menshikh-iv I have finished training the Gensim's Word2vec on a Google Cloud n1-highcpu instance ( 4 core Xeon E5 with 3.6 GB RAM) and it takes around 7.5 hours to train a model on Wikipedia corpus. I will look into Tensorflow and Word2Vec C code.
@souravsingh @manneshiva maybe you will be work together?
It is very important to write fully self-contained scripts for repeatable deployment.
Before we even start running the benchmarks, we should focus on the setup to make everything(tests, scripts etc.) reproducible. Using docker seems to be the easiest way to achieve this. I have build a docker which will allow us to run word2vec implementations of all popular frameworks. Also tested(ran) it with original c, tensorflow-cpu, gensim and dl4j codes on a small test corpus(text8). Will be pushing the code to a repo as soon as I refactor and write a few scripts.
@manneshiva you are right, keep it up!
@menshikh-iv I have created a repo to address this issue. Here is the link: https://github.com/manneshiva/benchmark-word2vec-frameworks Still requires a lot more things to be finished. Working on it. Will complete it soon.
@manneshiva A similar post, for inspiration: http://minimaxir.com/2017/07/cpu-or-gpu/
We are very interested in a robust large-scale benchmark of the ML landscape, especially with regard to hardware, costs and implementation quality.
Short description: compare a neural network algorithm (perhaps w2v / d2v) implementation across popular frameworks, on different cloud platforms, different hardware setups (CPU/GPU), measuring various metrics such as training quality, speed, memory footprint, ease of use and relative $$$ costs.
Questions we want to answer:
Plan:
The benchmark must be fully reproducible -- all scripts, data and settings must be recorded and versioned. It is also necessary to explicitly describe and set all relevant parameters, random seeds, etc. It is very important to write fully self-contained scripts for repeatable deployment. For example, you can use Docker/Ansible. Run the experiments multiple times, measuring the spread/variance of each metric.
Results: