jfloff / pywFM

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM
https://pypi.python.org/pypi/pywFM
MIT License
250 stars 43 forks source link

FM.run in Example code fails on Windows #6

Closed pablojrios closed 8 years ago

pablojrios commented 8 years ago

model = fm.run(features[:5], target[:5], features[5:], target[5:]) line in https://github.com/jfloff/pywFM example fails with the following output. Same error happens with both libFM compiled from sources and using binaries http://www.libfm.org/libfm-1.40.windows.zip.

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in () 20 # split features and target for train/test 21 # first 5 are train, last 2 are test ---> 22 model = fm.run(features[:5], target[:5], features[5:], target[5:]) 23 print(model.predictions) 24 # you can also get the model weights C:\Miniconda2\lib\site-packages\pywFM__init__.pyc in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set) 228 # parses rlog into 229 import pandas as pd --> 230 rlog = pd.read_csv(rlog_path, sep='\t') 231 os.close(rlog_fd) 232 os.remove(rlog_path) C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 527 skip_blank_lines=skip_blank_lines) 528 --> 529 return _read(filepath_or_buffer, kwds) 530 531 parser_f.__name__ = name C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds) 293 294 # Create the parser. --> 295 parser = TextFileReader(filepath_or_buffer, **kwds) 296 297 if (nrows is not None) and (chunksize is not None): C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in **init**(self, f, engine, **kwds) 610 self.options['has_index_names'] = kwds['has_index_names'] 611 --> 612 self._make_engine(self.engine) 613 614 def _get_options_with_defaults(self, engine): C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine) 745 def _make_engine(self, engine='c'): 746 if engine == 'c': --> 747 self._engine = CParserWrapper(self.f, **self.options) 748 else: 749 if engine == 'python': C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in **init**(self, src, *_kwds) 1117 kwds['allow_leading_cols'] = self.index_col is not False 1118 -> 1119 self._reader = _parser.TextReader(src, *_kwds) 1120 1121 # XXX pandas\parser.pyx in pandas.parser.TextReader.**cinit** (pandas\parser.c:5030)() ValueError: No columns to parse from file`
jfloff commented 8 years ago

Does libfm outputs anything before the error?

pablojrios commented 8 years ago

No, the first lines up to including fm = pywFM.FM(task='regression', num_iter=5) don't output anything. Please see screenshot.

image

jfloff commented 8 years ago

That's probably an issue with how you either compiled or linked libfm. Have you followed the instructions here? Have you tried running libfm outside the wrapper to see if its working correctly?

pablojrios commented 8 years ago

Yes. Compilation and linking succeeded. Also, as I indicated before I'm getting exactly the same behaviour with the binaries http://www.libfm.org/libfm-1.40.windows.zip available in Steffen R. site (http://www.libfm.org/). Thank you.

jfloff commented 8 years ago

Did you ran libfm outside the wrapper? What results did it output?

pablojrios commented 8 years ago

I'll test that and let you know. Thank you.

pablojrios commented 8 years ago

Hi João, libfm is running fine, see output below. I compiled and linked latest libfm sources.

libfm -task r -method mcmc -train train.txt -test test.txt -iter 10 -dim ‘1,1,2’ -out output.txt

test.txt train.txt output.txt

libFM Version: 1.4.2 Author: Steffen Rendle, srendle@libfm.org WWW: http://www.libfm.org/ This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt. This is free software, and you are welcome to redistribute it under certain conditions; for details see license.txt.

Loading train... has x = 0 has xt = 1 num_rows=5 num_values=20 num_features=10 min_target=1 max_target=5 Loading test... has x = 0 has xt = 1 num_rows=2 num_values=7 num_features=10 min_target=0 max_target=0

relations: 0

Loading meta data...

Iter= 0 Train=2.05834 Test=3.60555

Iter= 1 Train=2.24673 Test=3.60555

Iter= 2 Train=1.83006 Test=3.29604

Iter= 3 Train=1.77639 Test=3.2994

Iter= 4 Train=2.7062 Test=3.39629

Iter= 5 Train=1.12363 Test=3.0812

Iter= 6 Train=1.53352 Test=2.80507

Iter= 7 Train=1.84524 Test=2.56549

Iter= 8 Train=2.61957 Test=2.6785

Iter= 9 Train=1.65204 Test=2.60251

Thanks

jfloff commented 8 years ago

I was able to reproduce your error using the sources from libfm's website'. If you notice on the instructions I clearly state that it needs to use the source from github repo, here:

git clone https://github.com/srendle/libfm /home/libfm cd /home/libfm/ && make all export LIBFM_PATH=/home/libfm/bin/

and here:

Make sure you are compiling from source, since pywFM needs the save_model option which only appears in this commit from October 2015. Beware that the installers in libfm.org are both dated before this commit.

If you compile form github's repo you won't get that error. Please confirm so I can close the issue.

Thanks for you feedback!

pablojrios commented 8 years ago

Hi João

I compiled and linked the sources from github repo (https://github.com/srendle/libfm), compilation and linking succeeded. I did not and have never compiled sources from libfm's website. All the results I've shared so far were using libfm sources compiled from github repo.

Let me know what else do you need for troubleshooting.

Many thanks !

jfloff commented 8 years ago

Well that's weird ... can I ask you to test libfm outside the wrapper but with rlog option enabled? It would be something like this

libfm -task r -method mcmc -train train.txt -test test.txt -iter 10 -dim ‘1,1,2’ -out output.txt -rlog

pablojrios commented 8 years ago

Sure, I'll test that and share results later tonight.

Thank you

jfloff commented 8 years ago

Since we are at it, try also within the wrapper with with rlog option disabled rlog=False. Thanks

pablojrios commented 8 years ago

With rlog=False fm = pywFM.FM(task='regression', num_iter=5, rlog=False) doesn't throw the previous error, but after running the model the predictions and weights arrays are empty. See screenshot.

image

Whereas if I run libfm outside the wrapper specifying a value for the rlog parameter (libfm -task r -method mcmc -train train.txt -test test.txt -iter 10 -dim ‘1,1,2’ -out output.txt -rlog measures.txt) I get the measurements written to the file with success. See measures.txt file attached.

measures.txt

jfloff commented 8 years ago

Could you also test libfm (outside the wrapper) with save_model option enabled?

pablojrios commented 8 years ago

I have already tested libfm outside the wrapper with all previous parameters plus the the -save_model option, and the model file is generated successfully. If you need the actual model text file I can attach it very late today because I'm not right now at the machine where I ran these tests.

jfloff commented 8 years ago

According to the previous output, libfm is returning an empty model and hence we've seen that libfm binary is working correctly, and since I beleive LIBFM_PATH is correct, I think there is a problem on the temporary files that pywFM spawns.

What you could do to test this is use the pywFM flag temp_path to manually set the folder where the temporary files are stored, maybe you can look at the temporary files generated and see if something is generated or not.

pablojrios commented 8 years ago

Sure, I'll test pywFM with temp_path flag and share the temporary files, if any, with you. And yes, LIBFM_PATH is correct, else the line fm = pywFM.FM(task='regression', num_iter=5) fails. Thank you

pablojrios commented 8 years ago

I attach the model.txt file generated with the -save_model option of libfm: model.txt

Running the model from pywFM with them temp_path parameter set fm = pywFM.FM(task='regression', num_iter=5, temp_path=a_path) generates the 5 temporary files in the attached .zip file, 3 of which are empty, and also throws the error messages below:

temp_files.zip

ValueError Traceback (most recent call last)

in () 1 # split features and target for train/test 2 # first 5 are train, last 2 are test ----> 3 model = fm.run(features[:5], target[:5], features[5:], target[5:]) 4 print(model.predictions) 5 # you can also get the model weights C:\Miniconda2\lib\site-packages\pywFM__init__.pyc in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set) 228 # parses rlog into 229 import pandas as pd --> 230 rlog = pd.read_csv(rlog_path, sep='\t') 231 os.close(rlog_fd) 232 os.remove(rlog_path) C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 527 skip_blank_lines=skip_blank_lines) 528 --> 529 return _read(filepath_or_buffer, kwds) 530 531 parser_f.__name__ = name C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds) 293 294 # Create the parser. --> 295 parser = TextFileReader(filepath_or_buffer, **kwds) 296 297 if (nrows is not None) and (chunksize is not None): C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in **init**(self, f, engine, **kwds) 610 self.options['has_index_names'] = kwds['has_index_names'] 611 --> 612 self._make_engine(self.engine) 613 614 def _get_options_with_defaults(self, engine): C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine) 745 def _make_engine(self, engine='c'): 746 if engine == 'c': --> 747 self._engine = CParserWrapper(self.f, **self.options) 748 else: 749 if engine == 'python': C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in **init**(self, src, *_kwds) 1117 kwds['allow_leading_cols'] = self.index_col is not False 1118 -> 1119 self._reader = _parser.TextReader(src, *_kwds) 1120 1121 # XXX pandas\parser.pyx in pandas.parser.TextReader.**cinit** (pandas\parser.c:5030)() ValueError: No columns to parse from file
jfloff commented 8 years ago

I just spent couple of hours setting up Jupyter and libfm on a Windows VM, and I think I kinda reproduced your problem. Let me ask you, did you compile libfm in cygwin? And from which environment are you launching jupyter?

Thank you

pablojrios commented 8 years ago

I use gcc from MinGW-W64 (http://iweb.dl.sourceforge.net/project/mingw-w64/Toolchains%20targetting%20Win32/Personal%20Builds/mingw-builds/installer/mingw-w64-install.exe), I don't use cygwin. What's specific about Jupyter ? I get same error when I execute fm = pywFM.FM(task='regression', num_iter=5) from Jupyter, IPython or IDLE, all belonging to my single Anaconda Python 2.7 distribution. See for example the output from IDLE: image

jfloff commented 8 years ago

I had the same problem as you since I compiled using cygwin, and when running pywFM I saw the same error as you. I fixed it after I included cygwin bin folder in the system's path. Could you try something similar?

pablojrios commented 8 years ago

I always had C:\Program Files\mingw-w64\x86_64-5.3.0-posix-seh-rt_v4-rev0\mingw64\bin in the system path folder.

Thanks for your support João. For the time being I'll use libfm directly, I'll try to make some time to look at pywFM package source code.

jfloff commented 8 years ago

Then I'll try to suggest to run python directly from terminal (without iPhython / Jupyter). My best guess is that something between the 2 is not working.

Thank you for all the feedback and I'm sorry I couldn't help you further.