xthan / polyvore

Code for ACM MM'17 paper "Learning Fashion Compatibility with Bidirectional LSTMs"
Apache License 2.0
158 stars 70 forks source link

TensorFlow version #2

Closed alekskorupa closed 6 years ago

alekskorupa commented 6 years ago

Hi, I am writing as the following problem occurred when I tried to use your code.

./train.sh

Traceback (most recent call last):
  File "polyvore/train.py", line 25, in <module>
    import polyvore_model_bi as polyvore_model
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 29, in <module>
    from ops import image_embedding
  File "/mnt/datasets/polyvore-lstm/polyvore/ops/image_embedding.py", line 25, in <module>
    from tensorflow.contrib.slim.python.slim.nets.inception_v3 import inception_v3_base
ImportError: No module named inception_v3 

It seems that this module is not available until tensorflow version 0.11... are you positive that 0.10 is the one that works? Thanks for help.

Best, Aleksander

xthan commented 6 years ago

Hi Alek,

I built this using some version between r0.10 and r0.11, because I forked the code from the first version of Tensorflow's im2txt repo. However, TF developers have moved this repo into a new research folder and make it very hard to track the initial version. (https://github.com/tensorflow/models/commit/f87a58cd96d45de73c9a8330a06b2ab56749a7fa#comments)

By checking my TF repo, the version I used is v0.10.0-1705-g6218ac2, but I could imagine that it would be very hard for you to find this version and install it from source. So can you try if Tensorflow r0.11 works? If you can run the code, r0.10 and r0.11 should have very similar performance.

Let me know if r0.11 works for you.

Thanks, Xintong

alekskorupa commented 6 years ago

Hi Xintong,

Thank you for your quick response. Yes, I was suspecting that r0.11 could work, but after trying it I have a new error, again due to missing module

. train.sh

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: 
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3448] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
INFO:tensorflow:Prefetching values from 128 files matching data/tf_records/train-no-dup-?????-of-00128
Traceback (most recent call last):
  File "polyvore/train.py", line 111, in <module>
    tf.app.run()
  File "/home/oleks/anaconda/envs/biLSTM_old/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "polyvore/train.py", line 66, in main
    model.build()
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 676, in build
    self.build_inputs()
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 217, in build_inputs
    images.append(self.process_image(encoded_images[i],image_idx=i))
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 159, in process_image
    image_idx=image_idx)
  File "/mnt/datasets/polyvore-lstm/polyvore/ops/image_processing.py", line 82, in process_image
    image_summary("original_image/" + str(image_idx), image)
  File "/mnt/datasets/polyvore-lstm/polyvore/ops/image_processing.py", line 71, in image_summary
    tf.summary.image(name, tf.expand_dims(image, 0))
AttributeError: 'module' object has no attribute 'image'

I have also tried r0.12.1 with similar error as a result

.  train.sh

INFO:tensorflow:Prefetching values from 128 files matching data/tf_records/train-no-dup-?????-of-00128
Traceback (most recent call last):
  File "polyvore/train.py", line 111, in <module>
    tf.app.run()
  File "/home/oleks/anaconda/envs/biLSTM_old/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "polyvore/train.py", line 66, in main
    model.build()
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 679, in build
    self.build_model()
  File "/mnt/datasets/polyvore-lstm/polyvore/polyvore_model_bi.py", line 377, in build_model
    tf.losses.add_loss(emb_batch_loss * self.config.emb_loss_factor)
AttributeError: 'module' object has no attribute 'losses'

I have actually manage to train the model using higher version (tensorflow 1.3). Now having the model weights I miss the code for inference. In particular, I need to perform multimodal (image + text) query for item retrieval. If you think you have the working version available, I would appreciate if you share it somewhere. Thanks again for your help.

Best regards,

Aleksander

xthan commented 6 years ago

Are you the current version of my code? Since I am using tf.image_summary not tf.summary.image in image_processing.py.

I will check how to make it run under r0.11 by the end of this week and let you know how to do it.

alekskorupa commented 6 years ago

Hi again,

First of all, sorry for not replying for so long, I had a busy week last week. Yes, it might actually be the case that I was running some other than current version of your code. Anyway, after some modifications, I have managed to run extract_features script using TensorFlow 1.3.

Now, I would like to perform multimodal query to generate outfit like in your paper, but first, I guess I need to extract a semantic representation of the text query based on the trained embedding. Would you be able to tell me how to do that?

Best regards,

Aleksander

xthan commented 6 years ago

model.embedding_map contains the embedding of each word in the vocabulary.

[word_emb] = sess.run([model.embedding_map])

If you want to get the representation of a text query containing words a, b, c, you just need to feed the indices of a, b, c and average their embeddings:


def norm_row(a):
  try:
    return a / np.linalg.norm(a, axis=1)[:, np.newaxis]
  except:
    return a / np.linalg.norm(a)

words = open('word_dict.txt').read().splitlines()

query = 'a b c'
query = [i+1 for i in range(len(words)) if words[i] in query.split()]      
query_emb = norm_row(np.sum(word_emb[query],axis=0))
alekskorupa commented 6 years ago

I will try that. Thanks a lot for all your help.

Best wishes,

Aleksander

liurui16 commented 5 years ago

Hi @alekskorupa I'm trying to run extract_features script using TensorFlow 1.13.I have modified the original code and trained the model, and I got .meta,.index and .data checkpoint files. when I try to extract_features(in original code),I got a lot of NotFoundErr like this:

Caused by op 'save/RestoreV2', defined at:
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 93, in <module>
    tf.app.run()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 55, in main
    saver = tf.train.Saver()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 832, in __init__
    self.build()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 332, in _AddRestoreOps
    restore_sequentially)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 580, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1572, in restore_v2
    name=name)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key lstm/BW/basic_lstm_cell/bias not found in checkpoint
     [[node save/RestoreV2 (defined at E:/liurui/polyvore-master/polyvore/run_inference.py:55) ]]

Could you tell me some references materials for how to modify these codes to run on the new tensorflow? Thank you very much, Rui Liu

pachongchong commented 5 years ago

Hi @alekskorupa I'm trying to run extract_features script using TensorFlow 1.13.I have modified the original code and trained the model, and I got .meta,.index and .data checkpoint files. when I try to extract_features(in original code),I got a lot of NotFoundErr like this:

Caused by op 'save/RestoreV2', defined at:
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 93, in <module>
    tf.app.run()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 55, in main
    saver = tf.train.Saver()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 832, in __init__
    self.build()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 332, in _AddRestoreOps
    restore_sequentially)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 580, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1572, in restore_v2
    name=name)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key lstm/BW/basic_lstm_cell/bias not found in checkpoint
   [[node save/RestoreV2 (defined at E:/liurui/polyvore-master/polyvore/run_inference.py:55) ]]

Could you tell me some references materials for how to modify these codes to run on the new tensorflow? Thank you very much, Rui Liu

have you solved this question?I have met same question when I run the code.

pachongchong commented 5 years ago

I'm trying to run extract_features script using TensorFlow 0.10 ,but I get following question ,can you help me solve my problem? thank you very much!

Traceback (most recent call last): File "/media/公共硬盘A/ZhangJ/polyvore-master/polyvore/run_inference.py", line 103, in tf.app.run() File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/media/公共硬盘A/ZhangJ/polyvore-master/polyvore/run_inference.py", line 69, in main saver.restore(sess, "model/model_final/model.ckpt-34865") File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1560, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/home/ZhangJ/.conda/envs/polyvore_test/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: <exception str() failed>

pachongchong commented 5 years ago

Hi @alekskorupa I'm trying to run extract_features script using TensorFlow 1.13.I have modified the original code and trained the model, and I got .meta,.index and .data checkpoint files. when I try to extract_features(in original code),I got a lot of NotFoundErr like this:

Caused by op 'save/RestoreV2', defined at:
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 93, in <module>
    tf.app.run()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "E:/liurui/polyvore-master/polyvore/run_inference.py", line 55, in main
    saver = tf.train.Saver()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 832, in __init__
    self.build()
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 332, in _AddRestoreOps
    restore_sequentially)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\training\saver.py", line 580, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1572, in restore_v2
    name=name)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "C:\dlfiles\Anaconda36\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key lstm/BW/basic_lstm_cell/bias not found in checkpoint
   [[node save/RestoreV2 (defined at E:/liurui/polyvore-master/polyvore/run_inference.py:55) ]]

Could you tell me some references materials for how to modify these codes to run on the new tensorflow? Thank you very much, Rui Liu

Can I have your contact information please?I would like to communicate with you about the problem,Thank you.

xthan commented 5 years ago

Hi, the name of LSTM weights between two versions are different. You may check them name by digging into the graph file. Hope this can help you: https://github.com/KranthiGV/Pretrained-Show-and-Tell-model/issues/7

pachongchong commented 5 years ago

Thank you very much!!Have a nice weekend.❤️

张景

邮箱:18202673958@163.com |

签名由 网易邮箱大师 定制

On 09/21/2019 20:44, Xintong Han wrote:

Hi, the name of LSTM weights between two versions are different. You may check them name by digging into the graph file. Hope this can help you: KranthiGV/Pretrained-Show-and-Tell-model#7

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.