hanzhanggit / StackGAN

MIT License
1.86k stars 455 forks source link

[Solution] How to run this project with Python 3.x and TensorFlow 1.x #30

Open Lotayou opened 6 years ago

Lotayou commented 6 years ago

I spent 5 hours getting the program running, which is a great waste of time. I hereby summarize all the necessary changes for this project to run in Python 3.x and TensorFlow r1.x environment.

I assume your working directory is ~/StackGAN/StageI.

1. Python 3.x compatibility issues

In addition to minor changes mentioned in #2, there are still a major issue:

Pickle Issue: The original pickle files are created in Python 2.7, and open it with Python 3 could lead to the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) The solution can be found here: Unpickle Python 2 object in Python 3

2. TensorFlow r1.x compatibility issues

tf.concat() Issue #11: If you encounter error message like this: TypeError: Expected int32, got <prettytensor.pretty_tensor_class.Layer object at 0x7f74d41abd90> of type 'Layer' instead. In TensorFlow r0.12, the function is like tf.concat(axis, value) while in TensorFlow r1.x version the argument order has been changed: tf.concat(value, axis)

PrettyTensor Issue #27: This issue is cause in PrettyTensor module with error message like this: File ".../site-packages/prettytensor/pretty_tensor_class.py", line 1335, in _strip_unnecessary_contents_from_stack for f, line_no, method, _ in result._traceback: ValueError: too many values to unpack (expected 4)

This issue has nothing to do with PrettyTensor package version, I use the latest 0.7.4 but 0.6.2 should also work.

The main cause of this problem is in _traceback format, in TensorFlow r1.3 the _traceback object is a list with each entry a 6-tuple like this: ('D:\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py', 241, '<module>', {'__name__': '__main__', '__doc__': '\nFile used to start kernels for the IPython Console\n', '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x0000021474E75CF8>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'D:\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py', '__cached__': None, 'os': <module 'os' from 'D:\\Anaconda3\\envs\\tensorflow\\lib\\os.py'>, 'osp': <module 'ntpath' from 'D:\\Anaconda3\\envs\\tensorflow\\lib\\ntpath.py'>, 'sys': <module 'sys' (built-in)>, 'IS_EXT_INTERPRETER': True, 'sympy_config': <function sympy_config at 0x00000214799891E0>, 'kernel_config': <function kernel_config at 0x0000021479989268>, 'varexp': <function varexp at 0x00000214799892F0>, 'main': <function main at 0x0000021479989378>}, 9, None)

I guess in TensorFlow r0.12 the entry only contains 4 elements. But anyway here's a quick workaround:

Change for f, line_no, method, _ in result._traceback: to for f, line_no, method, *_ in result._traceback: *_ takes any number of arguments and resolve whatever left in the unpacked tuple.

3. Summary Issue: TensorFlow r1.3 has a new summary class so many code should be adapted like this:

tf.merge_all_summaries() -> tf.summary.merge_all() tf.scalar_summary(k,v) -> tf.summary.scalar(k,v) summary_writer = tf.train.SummaryWriter(self.log_dir, sess.graph) -> summary_writer = tf.summary.FileWriter(self.log_dir, sess.graph)

4. Slicing Index Issue: The index must be integer, so in dataset.py line 80 something should be changed: `# cropped_image =\

images[i][w1: w1 + self._imsize, h1: h1 + self._imsize, :]

            original_image = images[i]
            cropped_image = original_image[int(w1): int(w1 + imsize),\
                                           int(h1): int(h1 + imsize), :]`

That's all the major compatibility issues that are necessary for training. Enjoy :) image

Lotayou commented 6 years ago

@hanzhanggit Can you please mention this in readme.md? Thanks!

SpadesQ commented 6 years ago

@Lotayou

After Change for f, lineno, method, in result._traceback: to for f, lineno, method, * in result.traceback: * takes any number of arguments and resolve whatever left in the unpacked tuple.

I got: for f, lineno, method, * in result._traceback: ^ SyntaxError: invalid syntax

How to solve?

I use python2.7,how to solve

Lotayou commented 6 years ago

@SpadesQ I use Python 3.6 myself so I don't know much about Python 2.7.

Maybe You should check if your _traceback file has the same format as mine (by print out its first entry like I did).

My traceback file contains 6 items per entry, but for loop only expectd 4 items, so I have to resolve the final items with *. If *_ does not work for you, just use some random variables to fill in the gap like this:

for f, line_no, method, blah1, blah2, blah3 in result._traceback:

BTW, This TensorFlow versions is a mess, I now use StackGANv2 PyTorch version.

KelvinBull commented 6 years ago

Hello,why doesn't Prettytensor library include customs_fully_connected/custom_conv2d. The version of tensorflow/Prettytensor is a bug?

ningning32 commented 6 years ago

i am counter the same question @Lotayou , have you solve the question? really thank you

ningning32 commented 6 years ago

for f, line_no, method, blah1, blah2, blah3 in result._traceback: ValueError: need more than 4 values to unpack for f, line_no, method, blah1, blah2, blah3, blah4 in result._traceback: ValueError: need more than 6 values to unpack i used python2.7 @SpadesQ

KelvinBull commented 6 years ago

if you use python2.7 , you can do it by change your code curvely: for all in result._traceback: allist = list(all)[:3] f = allist[0] line_no = allist[1] method = allist[2]

so, you can run ...

AnwarUllahKhan commented 5 years ago

@Lotayou Dear Sir, I am facing this problem

(base) C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution>python run_exp.py --cfg cfg/birds.yml --gpu 0 Using config: {'CONFIG_NAME': 'stageI', 'DATASET_NAME': 'birds', 'EMBEDDING_TYPE': 'cnn-rnn', 'GAN': {'DF_DIM': 64, 'EMBEDDING_DIM': 128, 'GF_DIM': 128, 'NETWORK_TYPE': 'default'}, 'GPU_ID': 0, 'TEST': {'BATCH_SIZE': 64, 'CAPTION_PATH': '', 'HR_IMSIZE': 256, 'LR_IMSIZE': 64, 'NUM_COPY': 16, 'PRETRAINED_MODEL': ''}, 'TRAIN': {'BATCH_SIZE': 64, 'B_WRONG': True, 'COEFF': {'KL': 2.0}, 'COND_AUGMENTATION': True, 'DISCRIMINATOR_LR': 0.0002, 'FINETUNE_LR': False, 'FLAG': True, 'FT_LR_RETIO': 0.1, 'GENERATOR_LR': 0.0002, 'LR_DECAY_EPOCH': 50, 'MAX_EPOCH': 600, 'NUM_COPY': 4, 'NUM_EMBEDDING': 4, 'PRETRAINED_EPOCH': 600, 'PRETRAINED_MODEL': '', 'SNAPSHOT_INTERVAL': 2000}, 'Z_DIM': 100} images: (2933, 76, 76, 3) embeddings: (2933, 10, 1024) list_filenames: 2933 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18 images: (8855, 76, 76, 3) embeddings: (8855, 10, 1024) list_filenames: 8855 002.Laysan_Albatross/Laysan_Albatross_0002_1027 Traceback (most recent call last): File "run_exp.py", line 59, in image_shape=dataset.image_shape File "C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution\model.py", line 31, in init self.d_encode_img_template = self.d_encode_image() File "C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution\model.py", line 161, in d_encode_image custom_conv2d(self.df_dim, k_h=4, k_w=4). File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1965, in method with _method_scope(input_layer, scopename) as (scope, ): File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1776, in _method_scope scopes.var_and_name_scope((name, None)) as (scope, var_scope): File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\scopes.py", line 55, in var_and_name_scope vs_key = tf.get_collection_ref(variable_scope._VARSCOPE_KEY) AttributeError: module 'tensorflow.python.ops.variable_scope' has no attribute '_VARSCOPE_KEY'

AnwarUllahKhan commented 5 years ago

@Lotayou @SpadesQ dear, I was training the model and putout the charger and goes out side when i came the my system is switch off, so now how can I continue my model again from that checkpoint? help me please, Thank you very much

Lotayou commented 5 years ago

@AnwarUllahKhan I guess there must be a parameter in config file where you can designate the ckpt file to be loaded for subseqeuent training. However if you cannot find one, try convert your tensorflow checkpoint to a pytorch one, and go to the pytorch implementation instead:)

AnwarUllahKhan commented 5 years ago

@Lotayou thank you I solve that. I successfully train this now but how can I run demo which is .sh file and I am on the windows....?

guwalgiya commented 5 years ago

saved me so much time! thanks!

ankit01ojha commented 5 years ago

@AnwarUllahKhan could you please elaborate on how you fixed it, I am also facing the same problem. And if you have made this project work on windows could you also tell me how you ran the shell script. @Lotayou could you also help me.

akhilvasvani commented 5 years ago

@Lotayou @AnwarUllahKhan @ankit01ojha, I am also facing the same problem with prettytensor for python3.6.

Problem:

python3 stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0
./misc/config.py:100: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = edict(yaml.load(f))
Using config:
{'CONFIG_NAME': 'stageI',
 'DATASET_NAME': 'birds',
 'EMBEDDING_TYPE': 'cnn-rnn',
 'GAN': {'DF_DIM': 64,
         'EMBEDDING_DIM': 128,
         'GF_DIM': 128,
         'NETWORK_TYPE': 'default'},
 'GPU_ID': 0,
 'TEST': {'BATCH_SIZE': 64,
          'CAPTION_PATH': '',
          'HR_IMSIZE': 256,
          'LR_IMSIZE': 64,
          'NUM_COPY': 16,
          'PRETRAINED_MODEL': ''},
 'TRAIN': {'BATCH_SIZE': 64,
           'B_WRONG': True,
           'COEFF': {'KL': 2.0},
           'COND_AUGMENTATION': True,
           'DISCRIMINATOR_LR': 0.0002,
           'FINETUNE_LR': False,
           'FLAG': True,
           'FT_LR_RETIO': 0.1,
           'GENERATOR_LR': 0.0002,
           'LR_DECAY_EPOCH': 50,
           'MAX_EPOCH': 600,
           'NUM_COPY': 4,
           'NUM_EMBEDDING': 4,
           'PRETRAINED_EPOCH': 600,
           'PRETRAINED_MODEL': '',
           'SNAPSHOT_INTERVAL': 2000},
 'Z_DIM': 100}
images:  (2933, 76, 76, 3)
embeddings:  (2933, 10, 1024)
list_filenames:  2933 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18
images:  (8855, 76, 76, 3)
embeddings:  (8855, 10, 1024)
list_filenames:  8855 002.Laysan_Albatross/Laysan_Albatross_0002_1027
Traceback (most recent call last):
  File "stageI/run_exp.py", line 63, in <module>
    image_shape=dataset.image_shape
  File "/home/akhil/StackGAN/stageI/model.py", line 35, in __init__
    self.d_encode_img_template = self.d_encode_image()
  File "/home/akhil/StackGAN/stageI/model.py", line 165, in d_encode_image
    custom_conv2d(self.df_dim, k_h=4, k_w=4).
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1965, in method
    with _method_scope(input_layer, scope_name) as (scope, _):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1776, in _method_scope
    scopes.var_and_name_scope((name, None)) as (scope, var_scope):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/scopes.py", line 55, in var_and_name_scope
    vs_key = tf.get_collection_ref(variable_scope._VARSCOPE_KEY)
AttributeError: module 'tensorflow.python.ops.variable_scope' has no attribute '_VARSCOPE_KEY'

How did you fix it?

AnwarUllahKhan commented 5 years ago

@akhilvasvani @ankit01ojha you both are using python 3+ so follow the instruction of @Lotayou first message...

akhilvasvani commented 5 years ago

@AnwarUllahKhan, @Lotayou does not mention how to solve the problem. Notice how my error and your error are exactly the same.

What did you do to solve your error?

AnwarUllahKhan commented 5 years ago

@akhilvasvani you can try this one too https://www.twblogs.net/a/5c713446bd9eee68dc3f25a0 or downgrade your tensorflow

AnwarUllahKhan commented 5 years ago

https://github.com/hanzhanggit/StackGAN/issues/51

akhilvasvani commented 5 years ago

Awesome. Thanks man. Much appreciated

akhilvasvani commented 5 years ago

Ok, so following the link you posted @AnwarUllahKhan, I changed: tf.get_collection_ref(variable_scope._VARSCOPE_KEY) to tf.get_collection_ref(variable_scope._VARSCOPESTORE_KEY)

However, I then hit another error:

File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1341, in get_variable_scope
    return get_variable_scope_store().current_scope
AttributeError: 'VariableScope' object has no attribute 'current_scope' 

Following the solution from the link, get_variable_scope() and get_variable_scope_store() will be called each other continuously and forces the main code to stop running. I didn't know how to add in "current_scope" without messing up the rest of variable_scope.py. So this didn't work.

Then I went back to the original problem and changed: tf.get_collection_ref(variable_scope._VARSCOPE_KEY) to tf.get_collection_ref(variable_scope.__VARSTORE_KEY).

However, when I reach the "custom_fullyconnected", I hit the ipdb debugger. Is this a similar path you went down?

akhilvasvani commented 5 years ago

In the ipdb debugger, it finds an error in custom_ops.py with the class custom_fully_connected, specifically with the matrix and and bias variables. I get the error:

 File "/home/akhil/StackGAN/stageI/model.py", line 48, in generate_condition
    conditions = (pt.wrap(c_var).flatten().custom_fully_connected(self.ef_dim * 2).
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1972, in method
    result = func(non_seq_layer, *args, **kwargs)
  File "./misc/custom_ops.py", line 158, in __call__
    init=tf.random_normal_initializer(stddev=stddev))
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1673, in variable
    collections=variable_collections)
  File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1479, in get_variable
    aggregation=aggregation)
File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
    aggregation=aggregation)
TypeError: get_variable() missing 1 required positional argument: 'name'

This is what is written in the file:

        try:
            if len(shape) == 4:
                input_ = tf.reshape(input_, tf.stack([tf.shape(input_)[0], np.prod(shape[1:])]))
                input_.set_shape([None, np.prod(shape[1:])])
                shape = input_.get_shape().as_list()

            with tf.variable_scope(scope or "Linear"):
                matrix = self.variable("Matrix", [in_dim or shape[1], output_size],
                                       tf.random_normal_initializer(stddev=stddev))
                bias = self.variable("bias", [output_size], tf.constant_initializer(bias_start))
                return input_layer.with_tensor(tf.matmul(input_, matrix) + bias, parameters=self.vars)
        except Exception:
            import ipdb; ipdb.set_trace()

Is there a way around this problem?

akhilvasvani commented 5 years ago

59 Solved it without using Pretty Tensor

AllenGe666 commented 5 years ago

image how to address this problem?

akhilvasvani commented 5 years ago

Oh, I did not train a model for the flower dataset, so you cannot use Han Zhang's pretrained model on my (flower) demo script.

Working on training that!

AllenGe666 commented 5 years ago

Oh, I did not train a model for the flower dataset, so you cannot use Han Zhang's pretrained model on my (flower) demo script.

Working on training that!

Could you please tell me where fo find your pre-trained model?

akhilvasvani commented 5 years ago

Unfortunately, I have not posted my pretrained model at the time. At the moment, my focus is training the StackGAN model with skip_thoughts vectors for birds. Once I am done with that, I will get back to training the model for flowers

rs2309 commented 5 years ago

Easy Solution to run on windows

  1. python 3.5

  2. prettytensor=0.7.1

  3. solve pickle issue : Unpickle Python 2 object in Python 3

  4. for tensorflow, cpu: pip install --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl gpu: pip install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl

  5. install rest of the packages

ast1997 commented 4 years ago

@akhilvasvani I got error while performing training for updated StackGAN project in your github. https://github.com/akhilvasvani/StackGAN. Can you please help me out

ast1997 commented 4 years ago

I am trying to run the sh demo/flowers_demo.sh file and I get an error "Command not found". This is the error upon running the command sh demo/flowers_demo.sh. demo/flowers_demo.sh: line 10: th: command not found