Closed cmllmrnn closed 6 years ago
Post the exact commands you used to compile wordrep.
I ran the following when I'm in the MITIE-master folder:
cd tools/wordrep mkdir build cd build cmake "Visual Studio 15 2017 Win64" .. cmake --build . --config Release
That should work fine. I don't know what is going on. If you can post a small dataset that reproduces the error I'll take a look. Give exact instructions for reproducing the problem.
For small datasets, wordrep works fine. GitHub only allows files smaller than 10MB data and I have a 13MB data that worked fine. Since I cannot post it here, I'll just give the URL. This is the next smallest decent French corpus I found online: http://www.statmt.org/europarl/v7/fr-en.tgz with size 339MB that failed.
cd C:\Users\cmllmrnn\Documents\MITIE Workspace\MITIE-master\tools\wordrep\build\Release
wordrep -e "C:\path\to\file\train"
number of raw ASCII files found: 1
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 1
Sample 50000000 random context vectors
Now do CCA (left size: 50000000, right size: 50000000).
Thanks, I'll take a look.
Try it now. I just pushed a fix and it should all work as expected in Visual Studio now.
Thanks a lot! Will compile and train and will let you know as soon as possible.
No problem. Thanks for reporting this.
I was able to train a 4GB data successfully. It took 3 days. Thank you!
Expected Behavior
Running wordrep should be fine as long as I have enough RAM.
Current Behavior
In my case, my machine has 160GB total memory (m4.10xlarge AWS instance size). I started training 4GB, 1GB, and even 339MB data but when the wordrep process reach around 25GB of RAM ("Now do CCA (left size: 50000000, right size: 50000000)." in the console), a dialog box appears and says wordrep.exe has stopped working. It works fine only for my 13MB data.
Steps to Reproduce
Maybe I am missing something out. Thank you in advance for the help.