question about trainDAN.py - loading training data

td042 commented 6 years ago

I was wondering what npz-files do I have to load in line 15 and 16 in order to train the model? At the moment those files aren't in the directory "data".

(14)>>>datasetDir = "../data/" (15)>>>trainSet = ImageServer.Load(datasetDir + "dataset_nimgs=40_perturbations=[0.2, 0.2, 20, 0.25]_size=[112, 112].npz") (16)>>>validationSet = ImageServer.Load(datasetDir + "dataset_nimgs=9_perturbations=[]_size=[112, 112].npz")

mariolew commented 6 years ago

@td042 Hi, if you follow the steps in README, you will get the npz-files. python training\testSetPreparation.py will create the npz-files, i.e. dataset_nimgs=x*.npz, x is a number which depends on the size of your dataset, so it might not be 40 and 9.

td042 commented 6 years ago

After running TestSetPreparation.py I got the files w300Set.npz, commonSet.npz and challengingSet.npz. Are these the files you meant?

mariolew commented 6 years ago

@td042 Yes.

td042 commented 6 years ago

Now I have chosen w300Set.npz as trainSet and commonSet.npz as validationSet. After the execution, I received the following error message. Do you have any ideas about what I did wrong? Also, I did not make any changes to the STAGE variable. What does the variable mean and how do I have to make the adjustments? Sorry for the many questions, I do not know all that well about deep learning yet.

ValueError Traceback (most recent call last)

in () 62 initLandmarks = trainSet.initLandmarks[0].reshape((1, 136)) 63 ---> 64 dan = DAN(initLandmarks) 65 66 STAGE = 2 /output/models.py in DAN(MeanShapeNumpy) 91 92 S2_AffineParam = TransformParamsLayer(S1_Ret, MeanShape) ---> 93 S2_InputImage = AffineTransformLayer(InputImage, S2_AffineParam) 94 S2_InputLandmark = LandmarkTransformLayer(S1_Ret, S2_AffineParam) 95 S2_InputHeatmap = LandmarkImageLayer(S2_InputLandmark) /output/layers.py in AffineTransformLayer(Image, Param) 93 return tf.reshape(OutImage,[IMGSIZE,IMGSIZE,1]) 94 ---> 95 return tf.map_fn(lambda args: affine_transform(args[0], args[1], args[2]),(Image, A, T), dtype=tf.float32) 96 97 /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/functional_ops.py in map_fn(fn, elems, dtype, parallel_iterations, back_prop, swap_memory, infer_shape, name) 407 parallel_iterations=parallel_iterations, 408 back_prop=back_prop, --> 409 swap_memory=swap_memory) 410 results_flat = [r.stack() for r in r_a] 411 /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in while_loop(cond, body, loop_vars, shape_invariants, parallel_iterations, back_prop, swap_memory, name, maximum_iterations) 2932 swap_memory=swap_memory) 2933 ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, loop_context) -> 2934 result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants) 2935 if maximum_iterations is not None: 2936 return result[1] /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in BuildLoop(self, pred, body, loop_vars, shape_invariants) 2718 self.Enter() 2719 original_body_result, exit_vars = self._BuildLoop( -> 2720 pred, body, original_loop_vars, loop_vars, shape_invariants) 2721 finally: 2722 self.Exit() /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in _BuildLoop(self, pred, body, original_loop_vars, loop_vars, shape_invariants) 2660 flat_sequence=vars_for_body_with_tensor_arrays) 2661 pre_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION) # pylint: disable=protected-access -> 2662 body_result = body(*packed_vars_for_body) 2663 post_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION) # pylint: disable=protected-access 2664 if not nest.is_sequence(body_result): /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/functional_ops.py in compute(i, tas) 397 """ 398 packed_values = input_pack([elem_ta.read(i) for elem_ta in elems_ta]) --> 399 packed_fn_values = fn(packed_values) 400 nest.assert_same_structure(dtype or elems, packed_fn_values) 401 flat_fn_values = output_flatten(packed_fn_values) /output/layers.py in (args) 93 return tf.reshape(OutImage,[IMGSIZE,IMGSIZE,1]) 94 ---> 95 return tf.map_fn(lambda args: affine_transform(args[0], args[1], args[2]),(Image, A, T), dtype=tf.float32) 96 97 /output/layers.py in affine_transform(I, A, T) 75 I = tf.reshape(I, [IMGSIZE, IMGSIZE]) 76 ---> 77 SrcPixels = tf.matmul(tf.reshape(Pixels, [IMGSIZE * IMGSIZE,2]), A) + T 78 SrcPixels = tf.clip_by_value(SrcPixels, 0, IMGSIZE - 2) 79 /usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py in matmul(a, b, transpose_a, transpose_b, adjoint_a, adjoint_b, a_is_sparse, b_is_sparse, name) 1960 are both set to True. 1961 """ -> 1962 with ops.name_scope(name, "MatMul", [a, b]) as name: 1963 if transpose_a and adjoint_a: 1964 raise ValueError("Only one of transpose_a and adjoint_a can be True.") /usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in __enter__(self) 5380 if self._values is None: 5381 self._values = [] -> 5382 g = _get_graph_from_inputs(self._values) 5383 self._g_manager = g.as_default() 5384 self._g_manager.__enter__() /usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _get_graph_from_inputs(op_input_list, graph) 5053 graph = graph_element.graph 5054 elif original_graph_element is not None: -> 5055 _assert_same_graph(original_graph_element, graph_element) 5056 elif graph_element.graph is not graph: 5057 raise ValueError("%s is not from the passed-in graph." % graph_element) /usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _assert_same_graph(original_item, item) 4989 if original_item.graph is not item.graph: 4990 raise ValueError("%s must be from the same graph as %s." % (item, -> 4991 original_item)) 4992 4993 ValueError: Tensor("Stage2/map_1/while/TensorArrayReadV3_1:0", shape=(2, 2), dtype=float32) must be from the same graph as Tensor("Reshape:0", shape=(12544, 2), dtype=float32).

mariolew commented 6 years ago

@td042 You should do python trainingSetPreparation.py to make trainset, meanwhile, you should set STAGE to 1, I've tried to git clone this repo and I found no problem. The meaning of STAGE can be found in the paper. However, your issue is strange, did you use the same version of TensorFlow as I used? I just cannot reproduce your issue, so would you mind showing your environment and steps in detail?

td042 commented 6 years ago

I forgot to run trainingSetPreparation.py. Now I have exchanged the training and validation set through the files "dataset_nimgs = 100perturbations = [] size = [112, 112] .npz", and "dataset_nimgs = 40_perturbations = [0.2, 0.2, 20, 0.25] _size = [112, 112 ] .npz "and I have set the STAGE variable to 1, but I still get the same error message. I use Floydhub with Tensorflow 1.5 and jupyter notebook environment (floyd run --mode jupyter --gpu --env tensorflow-1.5)

mariolew commented 6 years ago

@td042 According to the error message, it seems that you created two graphs, I don't know what's wrong with the code, I can run trainDAN.py with no error occurred, if you are using Jupiter notebook, try use python in the Terminal.

2018-03-23 23:12:32.497644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA Starting training...... Epoch: 0 Batch: 0 TestErr: 0.16190773 BatchErr: 0.4041887

mariolew / Deep-Alignment-Network-tensorflow

question about trainDAN.py - loading training data #4