Closed muthiyanbhushan closed 6 years ago
Installed Tensorflow from source packages.
But still I get the same error.
~/squeezenet$ python train_squeezenet.py --model_dir output/ --train_tfrecord_filepaths ~/create_tfrecords/flowers/ --validation_tfrecord_filepaths ~/create_tfrecords/flowers --network squeezenet --num_classes 5 --batch_size 8 --shuffle_buffer 2
/home/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "train_squeezenet.py", line 4, in
Please let me know if I am doing something wrong.
Tensorflow and squeezenet directories are in /home folder. Do I need to run the comaand from tensorflow/ folder?
Thanks.
Hey,
The slim library is actually in another repository: https://github.com/tensorflow/models
Within the repository, the slim library is located under research/slim: https://github.com/tensorflow/models/tree/master/research/slim
That's the library that I'm using. Sorry about that.
Can you please elaborate from where should I run the training script.
Thanks.
If you've cloned/forked/downloaded the tensorflow/models repo, then you need to make sure that the path to the directory containing the slim module is on your PYTHON_PATH environment variable. PYTHON_PATH=/home/username/models/research,<...paths to other modules...>
Thank you. I added the python path but when I run the training script by pointing it to the tfrecord files for flower dataset for training and validation I get error as below:
Traceback (most recent call last):
File "train_squeezenet.py", line 188, in
I tried running with Python2.7 as well. But I am having same issues. I have train and validation tfrecord files in same location: /home/bhushan/flower/ Can you please let me know about it.
Thanks.
I resolved previous issue. Thanks for it.
But now I am facing another issue: I have created 1 tfrecord file for training and passing that as an default input argument. I am having 2 of GTX 1080Ti GPU. My batch size is 16.
:~/squeezenet$ python train_squeezenet.py --network=squeezenet
2018-03-15 17:12:13.223110: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-15 17:12:13.507759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 8.21GiB
2018-03-15 17:12:13.790935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:81:00.0
totalMemory: 10.91GiB freeMemory: 10.61GiB
2018-03-15 17:12:13.791006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] Device peer to peer matrix
2018-03-15 17:12:13.791036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] DMA: 0 1
2018-03-15 17:12:13.791044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 0: Y N
2018-03-15 17:12:13.791050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 1: N Y
2018-03-15 17:12:13.791060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1
2018-03-15 17:12:14.383777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8937 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-03-15 17:12:14.387409: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 8.73G (9372067328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-03-15 17:12:14.479798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8937 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/bmuthiyan/models/research/slim/deployment/model_deploy.py:363: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
2018-03-15 17:12:24.584707: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image (data type: string) is required but could not be found.
2018-03-15 17:12:24.584801: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at iterator_ops.cc:870 : Invalid argument: Feature: image (data type: string) is required but could not be found.
[[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING, DT_INT64], dense_keys=["image", "label"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1)]]
Traceback (most recent call last):
File "train_squeezenet.py", line 193, in
In inputs.py file
def _preprocess_image(self, raw_image): image = tf.image.decode_jpeg(raw_image, channels=3) image = tf.image.resize_images(image, self.target_image_size) image = tf.image.convert_image_dtype(image, tf.float32) if self.distort_image: image = tf.image.random_flip_left_right(image) image = tf.transpose(image, [2,0,1]) return image
there is a image transpose in above function from HWC to CHW.
In file squeezenet.py the input data format is considered as NCHW as per below defination:
def _arg_scope(is_training, weight_decay, bn_decay): with arg_scope([conv2d], weights_regularizer=l2_regularizer(weight_decay), normalizer_fn=batch_norm, normalizer_params={'is_training': is_training, 'fused': True, 'decay': bn_decay}): with arg_scope([conv2d, avg_pool2d, max_pool2d, batch_norm], data_format='NCHW') as sc: return sc
Can you please elaborate more on this.
I think my above error is because of dimension mismatch.
Please, let me know.
Thanks.
The error "Invalid argument: Feature: image (data type: string) is required but could not be found." indicates that the TFrecord examples don't have a feature called 'image'
Hello @vonclites,
Can you guide me through the code for creating the tfrecords file. I have created a tfrecord file which has image and label attribute in it, but not sure if it is generated correctly.
image_to_tfrecord function:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example( features=tf.train.Features( feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), } ) )
I was able to train the Resnet model which is available in tensorflow/models/official/resnet using the tf_record files which I have generated before.
Can you please let me know. Thanks.
I was able to train the Squeezenet model.
Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.
But, it worked. Thanks.
Actually you can modify the feature names in _preprocess_example
and _parse_serialized_example
in the inputs.py from line 153:
def _preprocess_example(self, serialized_example):
parsed_example = self._parse_serialized_example(serialized_example)
image = self._preprocess_image(parsed_example['image/encoded'])
return {'image': image}, parsed_example['image/class/label']
def _preprocess_image(self, raw_image):
image = tf.image.decode_jpeg(raw_image, channels=3)
image = tf.image.resize_images(image, self.target_image_size)
image = tf.image.convert_image_dtype(image, tf.float32)
if self.distort_image:
image = tf.image.random_flip_left_right(image)
image = tf.transpose(image, [2, 0, 1])
return image
@staticmethod
def _parse_serialized_example(serialized_example):
features = {
'image/encoded': tf.FixedLenFeature([], tf.string),
'image/format': tf.FixedLenFeature([], tf.string),
'image/class/label': tf.FixedLenFeature([], tf.int64),
'image/height': tf.FixedLenFeature([], tf.int64),
'image/width': tf.FixedLenFeature([], tf.int64),
}
return tf.parse_single_example(serialized=serialized_example,
features=features)
Like this. So you can use this script on the TFRecord that's generated from the official scripts.
I was able to train the Squeezenet model.
Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.
But, it worked. Thanks.
Could you please share the script you used to train on custom images dataset.
I was able to train the Squeezenet model.
Had to modify the script to read the image/encoded data instead of Image attribute in train_squeezenet.py file.
But, it worked. Thanks.
My same question: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\ can you tell me how to change "input.py" file?
Hello @vonclites,
When I try to run the train_squeezenet.py script, I get following error for slim.deploymet.
Traceback (most recent call last): File "train_squeezenet.py", line 4, in
from slim.deployment import model_deploy
ImportError: No module named slim.deployment
I am having anaconda environment with python2.7 and Tensorflow 1.6.0 installed using conda environment steps in Tensorflow.
I tried to Google the error but could not found more detail solutions.
Please, let me know about it.
Thanks.