kopopt / fast_tffm

fast_tffm: Tensorflow-based Distributed Factorization Machine
Apache License 2.0
144 stars 50 forks source link

canot run #3

Closed tashanzhishi closed 7 years ago

tashanzhishi commented 8 years ago

my tensorflow's version is 0.9, and when i command python fast_tffm.py train sample.cfg, the .py cant run. its bug info as below.

Traceback (most recent call last): File "fast_tffm.py", line 2, in from py.fm_ops import fm_ops File "/root/wyb/tensorflow_benchmark/fast_tffm/fast_tffm-master/py/fm_ops.py", line 5, in fm_ops = tf.load_op_library(os.path.dirname(os.path.realpath(file)) + '/../lib/libfast_tffm.so') File "/usr/local/python2.7/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 71, in load_op_library raise errors._make_specific_exception(None, None, error_msg, error_code) tensorflow.python.framework.errors.NotFoundError: /root/wyb/tensorflow_benchmark/fast_tffm/fast_tffm-master/py/../lib/libfast_tffm.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

how to modify souce code to support 0.9?

tashanzhishi commented 8 years ago

my env is that redhot 6.5 (linux core verison is 2.6.32) python 2.7.12 gcc 5.3.0 i run tensorflow 0.9 without GPU

snnn commented 8 years ago

Hi @tashanzhishi

try this patch: 1.txt

kopopt commented 8 years ago

@snnn Thanks a lot for your help!

tashanzhishi commented 8 years ago

@snnn thank you very much. I reslove the problem by your idea.

tashanzhishi commented 8 years ago

@kopopt you have train 36672494 training examples in 157 seconds in local model. is the dataset's size is 1TB? if i want to make the training process run longer time(i have optimize the tensorflow's communication model, and i want to contrast with original), how can i modify the parameter?

kopopt commented 8 years ago

@tashanzhishi I use the criteo dataset. 36672494 training examples is only about 7.7 GB

Actually I did not quite get your request about running longer time. The speed is obtained by using 10 threads. If you use one training thread, the speed will become quite slow.

tashanzhishi commented 7 years ago

@kopopt thank your response, and i am downloading the dataset.