piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.7k stars 4.38k forks source link

File not found in wordrank wrapper #1310

Closed mohammad1234 closed 7 years ago

mohammad1234 commented 7 years ago

I've been trying to train a wordrank model with the provided text8 data with the workrank package using gensim wrapper, but getting No such file or directory error, which is unusual cause I've verified my wordrank installation by running the demo script provided with wordrank package which runs without any issue. Also i checked the permission on the wordrank installation directory, which should not be any problem.

Is that a bug in the gensim package or something i'm missing here ?

test_model = Wordrank.train(wr_path = '/mnt/wordrank', corpus_file = '/mnt/wordrank/scripts/text8', out_path = '/mnt/word_rank_out/', iter=6)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python2.7/site-packages/gensim/models/wrappers/wordrank.py", line 105, in train
    utils.check_output(w, args=command, stdin=r)
  File "/usr/local/lib64/python2.7/site-packages/gensim/utils.py", line 1167, in check_output
    process = subprocess.Popen(stdout=stdout, *popenargs, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
menshikh-iv commented 7 years ago

@mohammad1234 Can you run command from utils.check_output(w, args=command, stdin=r), you need command

@parulsethi Can you please add logging running command before run utils.check_output here

For example

parulsethi commented 7 years ago

The out_path is actually a directory that is created inside the wordrank directory to save all the meta data and embedding dumps. It has to be at first level in wordrank directory for handling further file creations/processing, ex. it can be word_rank_out/, not /mnt/word_rank_out/ (hence results in the reported error).

I think out_path is a misleading parameter name, out_name could be a better option to indicate just the directory name input. I'll send the PR for this

menshikh-iv commented 7 years ago

Fixed in #1332