ImportError: No module named slim.deployment

muthiyanbhushan commented 6 years ago

Hello @vonclites,

When I try to run the train_squeezenet.py script, I get following error for slim.deploymet.

Traceback (most recent call last): File "train_squeezenet.py", line 4, in from slim.deployment import model_deploy ImportError: No module named slim.deployment

I am having anaconda environment with python2.7 and Tensorflow 1.6.0 installed using conda environment steps in Tensorflow.

I tried to Google the error but could not found more detail solutions.

Please, let me know about it.

Thanks.

muthiyanbhushan commented 6 years ago

Installed Tensorflow from source packages.

But still I get the same error.

~/squeezenet$ python train_squeezenet.py --model_dir output/ --train_tfrecord_filepaths ~/create_tfrecords/flowers/ --validation_tfrecord_filepaths ~/create_tfrecords/flowers --network squeezenet --num_classes 5 --batch_size 8 --shuffle_buffer 2 /home/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "train_squeezenet.py", line 4, in from slim.deployment import model_deploy ModuleNotFoundError: No module named 'slim'

Please let me know if I am doing something wrong.

Tensorflow and squeezenet directories are in /home folder. Do I need to run the comaand from tensorflow/ folder?

Thanks.

vonclites commented 6 years ago

Hey,

The slim library is actually in another repository: https://github.com/tensorflow/models

Within the repository, the slim library is located under research/slim: https://github.com/tensorflow/models/tree/master/research/slim

That's the library that I'm using. Sorry about that.

muthiyanbhushan commented 6 years ago

Can you please elaborate from where should I run the training script.

Thanks.

vonclites commented 6 years ago

If you've cloned/forked/downloaded the tensorflow/models repo, then you need to make sure that the path to the directory containing the slim module is on your PYTHON_PATH environment variable. PYTHON_PATH=/home/username/models/research,<...paths to other modules...>

muthiyanbhushan commented 6 years ago

Thank you. I added the python path but when I run the training script by pointing it to the tfrecord files for flower dataset for training and validation I get error as below:

Traceback (most recent call last): File "train_squeezenet.py", line 188, in run() File "train_squeezenet.py", line 184, in run _run(args) File "train_squeezenet.py", line 32, in _run pipeline = inputs.Pipeline(args, sess) File "/home/bhushan/tensorflow/models/official/squeezenet/inputs.py", line 23, in init target_image_size=target_image_size File "/home/bhushan/tensorflow/models/official/squeezenet/inputs.py", line 94, in _create_dataset dataset = input_processor.from_tfrecords(files) File "/home/bhushan/tensorflow/models/official/squeezenet/inputs.py", line 149, in from_tfrecords seed=self.seed File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 611, in shuffle return ShuffleDataset(self, buffer_size, seed, reshuffle_each_iteration) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1284, in init buffer_size, dtype=dtypes.int64, name="buffer_size") File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 946, in convert_to_tensor as_ref=False) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1036, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 214, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 421, in make_tensor_proto raise ValueError("None values not supported.") ValueError: None values not supported.

I tried running with Python2.7 as well. But I am having same issues. I have train and validation tfrecord files in same location: /home/bhushan/flower/ Can you please let me know about it.

Thanks.

muthiyanbhushan commented 6 years ago

I resolved previous issue. Thanks for it.

But now I am facing another issue: I have created 1 tfrecord file for training and passing that as an default input argument. I am having 2 of GTX 1080Ti GPU. My batch size is 16.

:~/squeezenet$ python train_squeezenet.py --network=squeezenet 2018-03-15 17:12:13.223110: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-03-15 17:12:13.507759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:02:00.0 totalMemory: 10.91GiB freeMemory: 8.21GiB 2018-03-15 17:12:13.790935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:81:00.0 totalMemory: 10.91GiB freeMemory: 10.61GiB 2018-03-15 17:12:13.791006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] Device peer to peer matrix 2018-03-15 17:12:13.791036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] DMA: 0 1 2018-03-15 17:12:13.791044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 0: Y N 2018-03-15 17:12:13.791050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 1: N Y 2018-03-15 17:12:13.791060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1 2018-03-15 17:12:14.383777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8937 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) 2018-03-15 17:12:14.387409: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.73G (9372067328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 2018-03-15 17:12:14.479798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8937 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/bmuthiyan/models/research/slim/deployment/model_deploy.py:363: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2018-03-15 17:12:24.584707: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image (data type: string) is required but could not be found. 2018-03-15 17:12:24.584801: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at iterator_ops.cc:870 : Invalid argument: Feature: image (data type: string) is required but could not be found. [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING, DT_INT64], dense_keys=["image", "label"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]] Traceback (most recent call last): File "train_squeezenet.py", line 193, in run() File "train_squeezenet.py", line 189, in run _run(args) File "train_squeezenet.py", line 116, in _run sess.run(train_op, feed_dict=pipeline.training_data) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image (data type: string) is required but could not be found. [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING, DT_INT64], dense_keys=["image", "label"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]] [[Node: inputs/IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3,224,224], [?]], output_types=[DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: clone_0/squeezenet/fire4/expand/3x3/BatchNorm/cond/FusedBatchNorm_1/Switch_3/_613 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge2083...1/Switch_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

muthiyanbhushan commented 6 years ago

In inputs.py file

def _preprocess_image(self, raw_image): image = tf.image.decode_jpeg(raw_image, channels=3) image = tf.image.resize_images(image, self.target_image_size) image = tf.image.convert_image_dtype(image, tf.float32) if self.distort_image: image = tf.image.random_flip_left_right(image) image = tf.transpose(image, [2,0,1]) return image

there is a image transpose in above function from HWC to CHW.

In file squeezenet.py the input data format is considered as NCHW as per below defination:

def _arg_scope(is_training, weight_decay, bn_decay): with arg_scope([conv2d], weights_regularizer=l2_regularizer(weight_decay), normalizer_fn=batch_norm, normalizer_params={'is_training': is_training, 'fused': True, 'decay': bn_decay}): with arg_scope([conv2d, avg_pool2d, max_pool2d, batch_norm], data_format='NCHW') as sc: return sc

Can you please elaborate more on this.

I think my above error is because of dimension mismatch.

Please, let me know.

Thanks.

vonclites commented 6 years ago

The error "Invalid argument: Feature: image (data type: string) is required but could not be found." indicates that the TFrecord examples don't have a feature called 'image'

muthiyanbhushan commented 6 years ago

Hello @vonclites,

Can you guide me through the code for creating the tfrecords file. I have created a tfrecord file which has image and label attribute in it, but not sure if it is generated correctly.

image_to_tfrecord function:

def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example( features=tf.train.Features( feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), } ) )

I was able to train the Resnet model which is available in tensorflow/models/official/resnet using the tf_record files which I have generated before.

Can you please let me know. Thanks.

muthiyanbhushan commented 6 years ago

I was able to train the Squeezenet model.

Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.

But, it worked. Thanks.

qysnn commented 6 years ago

Actually you can modify the feature names in _preprocess_example and _parse_serialized_example in the inputs.py from line 153:

    def _preprocess_example(self, serialized_example):
        parsed_example = self._parse_serialized_example(serialized_example)
        image = self._preprocess_image(parsed_example['image/encoded'])
        return {'image': image}, parsed_example['image/class/label']

    def _preprocess_image(self, raw_image):
        image = tf.image.decode_jpeg(raw_image, channels=3)
        image = tf.image.resize_images(image, self.target_image_size)
        image = tf.image.convert_image_dtype(image, tf.float32)
        if self.distort_image:
            image = tf.image.random_flip_left_right(image)
        image = tf.transpose(image, [2, 0, 1])
        return image

    @staticmethod
    def _parse_serialized_example(serialized_example):
        features = {
            'image/encoded': tf.FixedLenFeature([], tf.string),
            'image/format': tf.FixedLenFeature([], tf.string),
            'image/class/label': tf.FixedLenFeature([], tf.int64),
            'image/height': tf.FixedLenFeature([], tf.int64),
            'image/width': tf.FixedLenFeature([], tf.int64),
        }
        return tf.parse_single_example(serialized=serialized_example,
                                       features=features)

Like this. So you can use this script on the TFRecord that's generated from the official scripts.

Bhagyoday-Patil commented 5 years ago

I was able to train the Squeezenet model.

Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.

But, it worked. Thanks.

Could you please share the script you used to train on custom images dataset.

aboy2018 commented 5 years ago

I was able to train the Squeezenet model.

Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.

But, it worked. Thanks.

My same question： Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\ can you tell me how to change "input.py" file?

vonclites / squeezenet

ImportError: No module named slim.deployment #6