Closed ahundt closed 6 years ago
Should the current ImageDataGenerator be extended or is a separate class like Keras-FCN's SegDataGenerator clearer?
Depends; if you were to implement it as a subclass, which methods would be reused and which would have to be overridden?
Should there be a guide of some sort? Is something as clear as 30 seconds to keras segmentation possible?
Sure
What is needed to handle large datasets quickly and efficiently? (should this be out of scope?)
Reading images from disk with ImageDataGenerator
using multiprocessing and several processes is already pretty quick and efficient.
The HDF5Matrix
can be made more efficient via use of multiprocessing (or at least threading) to avoid IO being a bottleneck.
Really interested in helping you! Maybe we should have a dedicated slack channel so we could all discuss.
I had a Mean IOU implemented somewhere I'll try to find it! There is a lot of formats thought on how to specify bounding boxes/ segmentation map. SSD uses Priors shapes, Faster RCNN uses anchors boxes, YOLO v1 is using nothing. Could get quite crowded.
SSD Keras has some data augmentation for boxes. We could probably uses it.
@Dref360 the semantic_segmentation slack channel would work. Bounding box design input would be great because I'm not currently using them.
I would say that predicting a bounding box is a significantly different task from segmentation, in particular you may need a complicated loss function to handle many boxes. I'm also not sure if best practices are well established enough for this.
For upscaling operations popular choices include:
tf.image.resize
and discussion on distill.pub. Current UpSampling layer I think does nearest neighbor with integer upsampling factors. tf.depth_to_space
and some Keras+Theano implementations exist.Also pix2pix is a popular variant using adversarial training that would be nice to have as an example. There are several Keras implementations out there.
For FCNs I've found base Keras to be pretty useable but one sticking point is that it's not easy to replace a fixed size model or Input layer to one that has None size for all the spatial dimensions, which is all you really need to have a FCN that allows multiple scale inputs. I think the best way to do this now is to create a new instance of the same model except for the Input layer and use get_weights + set_weights. It would be nice if there was a convenient way to just resize the model's input spatial dimensions and have it propagate to all layers, raising an error if it's not possible e.g., if there's a Dense layer.
I'd be interested in contributing as well! However, keep in mind that there are a few subtasks within the segmentation problem and that makes the task harder.
For example, all semantic segmentation networks, such as FCN, segnet, ENet, ICNet etc, do is pixel classification. They cannot detect objects and therefore can't differentiate between distinct instances of the same class in an image.
Other works, such as DeepMask/SharpMask/FastMask, output mask proposals for each object they detect but they do not do classification. This means that in theory they can detect objects that belong in classes they have not seen before.
Finally, Instance Segmentation does both (e.g. Instance-FCN, FCIS, Mask R-CNN). It can tell where a person ends and another begins and also outputs a class label for each instance it detects.
Detection is an inherent part of the pipeline for two of the subtasks, so if we plan to cover all three cases, I don't think we can get away with not discussing it.
@PavlosMelissinos good points, training on varied tasks like instance recognition and mask proposals should also be considered, what are the best practices for that type of data? How are they typically formatted? Masks are also sometimes useful for segmentation, such as the pascal voc "ambiguous regions".
@Dref360 I thought about the bounding box issue some more and I agree with @allanzelener that the tools will be significantly different for bounding boxes. Unless there is a compelling reason I've missed to keep it here, I think bounding box algorithms should be considered out of scope for this issue and should be handled as a separate github issue.
For segmentation training it will be important to support loading data data from a directory, and to support the most common dataset formats, which to my knowledge are the Pascal VOC format and the COCO json format. This post goes into loading from a directory in a reasonable way and including support for Pascal VOC.
Here is how SegDataGenerator works in Keras-FCN:
seg_aug_generator = SegDataGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
channelwise_center=False,
rotation_range=0.,
width_shift_range=0.,
height_shift_range=0.,
shear_range=0.,
zoom_range=0.,
zoom_maintain_shape=True,
channel_shift_range=0.,
fill_mode='constant',
cval=0.,
label_cval=255,
crop_mode='none',
crop_size=(0, 0),
pad_size=None,
horizontal_flip=False,
vertical_flip=False,
rescale=None,
data_format='default')
generator = seg_aug_generator.flow_from_directory(
file_path, data_dir, data_suffix,
label_dir, label_suffix, classes,
ignore_label=255,
target_size=None, color_mode='rgb',
class_mode='sparse',
batch_size=32, shuffle=True, seed=None,
save_to_dir=None, save_prefix='', save_format='jpeg',
loss_shape=None)
model.fit_generator(generator=generator, ...)
# Some internal details for the directory iterator:
'''
Users need to ensure that all files exist.
Label images should be png images where pixel values represents class number.
find images -name *.jpg > images.txt
find labels -name *.png > labels.txt
for a file name 2011_002920.jpg, each row should contain 2011_002920
file_path: location of train.txt, or val.txt in PASCAL VOC2012 format,
listing image file path components without extension
data_dir: location of image files referred to by file in file_path
label_dir: location of label files
data_suffix: image file extension, such as `.jpg` or `.png`
label_suffix: label file suffix, such as `.png`, or `.npy`
loss_shape: shape to use when applying loss function to the label data
'''
I think much of this functionality can be added directly to ImageDataGenerator
.
file_path
becomes file_list
SegDataGenerator.flow_from_directory
'sfile_path
parameter should be replaced with a file_list
parameter and a separate function should be created to ImageDataGenerator that can load a pascal voc formatted .txt
file listing filenames without extensions and return a list of them.
class_mode
optionsclass_mode
should add new options to specify that it is a dense prediction task. What should these be named, pixel_categorical
, pixel_binary
, pixel_multilabel
etc or perhaps dense
instead of pixel
? Perhaps it should take a tuple or something else indicating the data dimensionality?
Does anyone have design suggestions for dealing with the dimension issue detailed by @allanzelener? The SegDataGenerator design is to simply pad images with mask pixels to the maximum expected image size. This seems to work okay, but can probably have significant computational cost.
I think most of the new augmentation options in SegDataGenerator also look good and can simply be added directly.
Supplementary Data is also likely necessary (definitely in my case), I think it may be wise to allow a second list of input files to be supplied in a different format, which can be simple vectors or images stored in a .mat or .npy, or some other format. However, perhaps this should be a separate class? If so, how would consistency of indexes be ensured? Can two different generators be chained together in a manner analogous to zip()
for lists?
loss_shape
loss_shape
is a workaround, because the output dimensions will vary based on the model, and we will want the loss function to operate on the output data as it is. Can it be avoided?
In the current ImageDataGenerator:
Any PNG, JPG or BMP images inside each of the subdirectories directory tree will be included in the generator.
I think the addition of data_dir, data_suffix, label_dir, label_suffix
is a good decision that does not need to conflict with this, they can simply default to None
which retains the current behavior.
The API could easily support arbitrary file formats with a function, object that opens the files in the directory and returns an appropriate numpy array. Should this exist? Which parameter should accept these? Perhaps instead of *_suffix
the parameter could be *_format
, which can take these classes and/or functions? Design suggestions welcome
@ahundt as a longtime Keras user that is now figuring out my way through multiclass semantic segmentation with sample weighting plus data augmentation via the ImageDataGenerator
, I fully support your initiative. I believe you've covered the majority of needs above, and can't think of anything smart to add.
What I can tell you is the ImageDataGenerator
API for my specific need (above) is a bit opaque. I have issues with the objects (for images and masks) built using .fit
and .flow
and then passed to .fit_generator
where an error is raised because it expects a tuple (when in fact a tuple is being presented).
y
but not X
. I'm not sure that's the case, but I'm hacking/resolving it with reshapes. A subsequent roadblock is a mismatch between image and mask batches, although both are set at 32
.My impression is these hardships are more due to the design not being for my case use, but given that it is one that seems quite popular, it would be beneficial to expose/redesign the API appropriately.
@mptorr if you're using ImageDataGenerator, I believe the mismatches are due to each object generating random numbers separately, and the workaround is to provide the same random seed to each so they access indices in the same order. SegDataGenerator resolves this by accepting image and label dirs in a single object.
@ahundt thanks for the suggestion—in fact I am using a fixed and identical seed for both generators, but still get the error. Anyway, I don't want to hijack this thread with my travails... at some point I hope to figure this out.
I was going to try your SegDataGenerator however wanted to ask 2 things about it, as they may pertain to your request for features/suggestions:
[1] it appears it currently does not support pixelwise weighting to compensate for class imbalance. This would be an important feature to me, as most of my segmentation tasks will have disproportionately over/under-represented classes. Currently I balance classes using Keras' sample_weighting
as temporal
, however without data augmentation due to my issues above. Since the sample weighting matches each pixel in the image/mask pair, it would need to be appropriately transformed to match augmented images/masks. Let me know if I'm overlooking this feature in SegDataGenerator.
[2] I'm a bit confused on how SegDataGenerator loads images. The comment in the class perhaps could be reworded (or have examples) for the most important arguments. I also didn't understand how to use this info: for a file name 2011_002920.jpg, each row should contain 2011_002920
. Of course, this lack of understanding may reflect my own limitations, but just thought it could help development.
I'll be glad to give it a spin, especially if there's an option for sample weighting. Glad to continue this conversation elsewhere if more appropriate than on this thread.
I'm sorry for taking this long to comment but I just found the time to do so and I think there's too much stuff to discuss here. Should we split the issue into multiple threads maybe?
I recognize the following parts of the pipeline as separate entities regarding standardization and support for different implementations: Dataset format, data preprocessing and augmentation, architecture IO - what kind of input/output should the actual computational graph expect? In other words, what should be the output of preprocessing/augmentation, and what should be the desired output of the network and therefore the input of evaluation. evaluation - MS-COCO uses a large set of metrics (stricter variants of IoU) and in that way it's elaborate. I also think it's one of the only ones that report AR@IoU0.5-0.95 which is supposed to correlate well with real world performance. For that reason, I have created a script, based on the one found on the FastMask repo, that runs inference on multiple images and converts the results to a MS-COCO evaluation format, a json file that looks like the one used to store ground truth annotations. Finally, should there also be support for auxilliary loss evaluation, such as the one used in many frameworks after detection? I mostly agree with you that this last one seems out of scope for this issue but maybe it deserves some discussion.
Imho, this is the stinkiest part of the pipeline and usually goes like this in most projects:
I believe semantic segmentation and anything that deals with bounding box should be considered separate tasks and be built upon gradually. Semantic segmentation is relatively simple, therefore maybe let's consider that first but acknowledge that it only covers a part of the wider task. Object detection networks are not yet standardized on keras, so we probably should take it one step at a time.
I'm taking the initiative to start with one term that is ambiguous, resizing. I'm not sure what the proper terminology is for some of this stuff, so please bear with me:
Resizing can either be achieved through stretching (with pixel interpolation), padding (e.g. with zeros) or cropping.
Padding gives the worst results as it messes up somewhat with the statistics of the image and wastes network capacity at the same time.
Cropping, on the other hand, may remove too much context from the image, which is also undesirable. Furthermore, on prediction, using crops means that only part of the image area will be seen by the network in each pass. Therefore multiple passes over the image are required in order to cover the whole area.
In general it seems obvious to me that some kind of stretching is necessary. However, it is problematic when used alone in the case of multi-label, one hot targets (most popular option in segmentation datasets, e.g. MS-COCO). An easy solution would be to convert each one hot vector to a class index vector, then to PIL.Image (or equivalent), do the resize there and then convert back to one hot and feed that into the network. This however forces the selection of a single label for each pixel. Is this an important issue or should we safely assume that it's due to labeling error (annotations are not exact)? Converting it back and forth is also slightly slow. scipy.ndimage.zoom can resize a numpy array natively but interpolation is done on all dimensions of the array, as far as I remember.
This is also an important feature since CNNs are not completely scale invariant. YOLOv2 for instance, in order to be able to learn to detect objects at various scales changes the shape of the input every few batches. In keras, this is not exactly easy. I think tensorflow only allows one dimension of the input to be unspecified (None), so this might not be keras' fault. I have no idea whether it works with theano as a backend.
As far as data loading goes, I suggest that some variant of the MSCOCO class I have created for the enet-keras repository be used. It definitely needs quite some cleaning up and unit tests of course as it's a bit clumsy right now but I believe the set of operations is valid. Any kind of feedback is welcome obviously.
The logic of the class could be standardized (I have added a dummy Dataset class which I will populate as soon as I'm a little bit more confident about the layout) and easily extended in order to allow custom datasets and/or loading from disk.
For FCNs I've found base Keras to be pretty useable but one sticking point is that it's not easy to replace a fixed size model or Input layer to one that has None size for all the spatial dimensions, which is all you really need to have a FCN that allows multiple scale inputs. I think the best way to do this now is to create a new instance of the same model except for the Input layer and use get_weights + set_weights. It would be nice if there was a convenient way to just resize the model's input spatial dimensions and have it propagate to all layers, raising an error if it's not possible e.g., if there's a Dense layer.
Does anyone have design suggestions for dealing with the dimension issue detailed by @allanzelener? The SegDataGenerator design is to simply pad images with mask pixels to the maximum expected image size. This seems to work okay, but can probably have significant computational cost.
@allanzelener @ahundt I think there's a messy workaround for that using Permute and a TimeDistributed wrapper but it's not exactly a solution.
Supplementary Data is also likely necessary (definitely in my case), I think it may be wise to allow a second list of input files to be supplied in a different format, which can be simple vectors or images stored in a .mat or .npy, or some other format. However, perhaps this should be a separate class? If so, how would consistency of indexes be ensured? Can two different generators be chained together in a manner analogous to zip() for lists?
@ahundt Can you explain what you mean here by "supplementary data" and what the use case is? I don't quite get it. For the zipping part maybe you're looking for this? EDIT: It's not a big deal though, why not just write a function that calls next for both generators and yields the pairs in a tuple?
Options for input data to SegDataGenerator
style ImageDataGenerator
could either:
I'm leaning towards option 1 because it would maximize compatibility with the existing ImageDataGenerator
, would be easy to understand, and would work for many use cases. More complex use cases could reasonably write their own augmentation class and call the basic functions (translate, zoom, etc) with reasonable ease.
@PavlosMelissinos thanks for the feedback, replies below.
@ahundt Can you explain what you mean here by "supplementary data" and what the use case is? I don't quite get it. For the zipping part maybe you're looking for this?
My use case is a vector that represents how a robot arm in the scene will move and an image of that robot. So the input data is an image and a vector, while the labels is a 2D image containing scores of how successful the motions will be if they are relative to that x,y coordinate in the image.
Another example would be input text and an image. Ex: "the person on the right" and an image of two people side by side. The labeled data would be the same dimensions as the original image right person's pixels labeled as 1 and all other pixels labeled as 0.
EDIT: It's not a big deal though, why not just write a function that calls next for both generators and yields the pairs in a tuple?
Sounds like a reasonable possibility. How would performing or not performing zoom/translation be specified for each input?
Padding gives the worst results as it messes up somewhat with the statistics of the image and wastes network capacity at the same time.
Padding definitely requires extra memory and processing power, but are the results really that bad? I think it might depend on the network design. Resnet specifies zero padding and is particularly effective, for example.
multi-label, one hot targets, [...] class index
We should support each of these modes because each makes sense for a variety of reasonable applications.
resize crops
How about the SegDataGenerator
API definition above? It lets the user specify the range of crop, translation, and resizing they would prefer.
Just a thought.
If you want to handle every case on earth (multi-label, one hot targets, [...] class index
), maybe keras is not the best place to do it? The same way that Tensorflow has tensorflow-transform. Keras could have a keras-transform that would be a dependency from Keras. Keras is a deep-learning library, not a preprocessing one.
Anyway, the Pytorch way to do data augmentation sounds pretty cool with transform.compose
@Dref360 Duh, yes you're right, I got carried away a bit, sorry about that. :) Preprocessing is technically out of scope for keras. On the other hand, segmentation is a popular task and standardizing preprocessing (like ImageDataGenerator does for classification) by adding support for some basic operations would be useful. The basic problem is that images in MS-COCO do not have a constant size, like ImageNet. The purpose is to decide on a design for a class like SegDataGenerator that might fix some of the shortcomings of ImageDataGenerator.
@ahundt re: supplementary data - Ah, I see. I don't think it's possible to formulate that in a way that is relevant to keras. A custom implementation, depending on the case seems like a cleaner approach. Covering every single combination of inputs does not seem feasible.
Padding definitely requires extra memory and processing power, but are the results really that bad? I think it might depend on the network design. Resnet specifies zero padding and is particularly effective, for example.
I was referring to padding in the context of preprocessing (where it takes up a sizable portion of the input image), it's my mistake for not making that clear. Zero padding within a CNN is not that bad (still skews statistics but it's not so big a deal and we don't really have a viable alternative).
How about the SegDataGenerator API definition above? It lets the user specify the range of crop, translation, and resizing they would prefer.
Say the user has an image/label pair that is originally 486px in height and 220 in width; the shape of the input tensor is (None, 256, 256, 3) for the image and (None, 65536, 81) for the label. How does SegDataGenerator deal with the conversion? Labels (one-hot) are tricky to resize in this case because numpy arrays do not properly support the operation (scipy has ndimage.zoom though, might be worth a shot), label 'bleeding' among non-spatial dimensions should not be allowed and NEAREST interpolation mode returns very weird and pixelated ground truth masks.
I think I'm in favor of using some presets (e.g. instance segmentation needs each sample to be a pair; a crop within a ground truth bounding box and the binary mask of that object) and leaving the rest up to the user.
If you want to handle every case on earth (multi-label, one hot targets, [...] class index), maybe keras is not the best place to do it?
I'd hardly suggest every case on earth, haha. It is very reasonable to let a user select from both the sets {single label, multi label}
and {single class, multi class}
for dense prediction tasks as they require. That means the following four options:
multi label (multi label, multi class)
Keras already supports those cases listed above for simple label prediction.
Anyway, the Pytorch way to do data augmentation sounds pretty cool with transform.compose
That led me to an interesting idea, rather than the sequential model style of pytorch's transform.compose
, perhaps this could work like the functional API? That could potentially make arbitrary application of augmentation much simpler! It could eventually also make it possible to use the TF backend image augmentation APIs, but I'll keep backends out of scope for now.
That said, selling a major API change is much more difficult than a minor extension of ImageDataGenerator
so I'll stick with the minor extension option for here, and I created https://github.com/fchollet/keras/issues/6655 where preprocessing layers can be discussed.
@allanzelener @ahundt I think there's a messy workaround for that using Permute and a TimeDistributed wrapper but it's not exactly a solution.
@PavlosMelissinos Could you elaborate on this?
Say the user has an image/label pair that is originally 486px in height and 220 in width; the shape of the input tensor is (None, 256, 256, 3) for the image and (None, 65536, 81) for the label. How does SegDataGenerator deal with the conversion?
This is one of the key changes I'm hoping we can make, where 2D labels are directly supported, in other words the label would be the same dimensions as the input data.
SegDataGenerator image/label transform code:
x = apply_transform(x, transform_matrix, img_channel_index,
fill_mode=self.fill_mode, cval=self.cval)
y = apply_transform(y, transform_matrix, img_channel_index,
fill_mode='constant', cval=self.label_cval)
Remember that labels cannot and should not be interpolated! Average of labels 1 and 3 is not the label 2. :-) You have to pick from 1 or 3 so while it isn't as smooth you've got to use an algorithm like nearest.
Could you elaborate on this?
From my experience, the problem in the arbitrary input shape scenario in a Fully Convolutional Network (no Dense layers) is at the end of the network, when you need to Flatten the output and compare it to the targets. I'm not confident that hack would work (it was actually suggested by a colleague as a temporary workaround), so I'll reproduce it tomorrow at work and get back to you.
This is one of the key changes I'm hoping we can make, where 2D labels are directly supported, in other words the label would be the same dimensions as the input data.
That's not a problem, after all reshaping is trivial.
Remember that labels cannot and should not be interpolated! Average of labels 1 and 3 is not the label 2. :-) You have to pick from 1 or 3 so while it isn't as smooth you've got to use an algorithm like nearest.
That's the actual problem (it's noticeably less smooth with nearest neighbor). Maybe there is a better solution?
EDIT: In semantic segmentation there is a direct association between a rgb pixel and the ground truth label pixel at the same position. If the annotation is done in a specific size and then that image is resized, there is information distortion because the pixels are moved and some unseen values might appear (especially in the case of bilinear, bicubic or lanczos antialiasing). I guess what I'm saying is that the pixel values of the resized target labels should be dependent on the values of the pixels in the rgb image and more specifically on the way the value of each pixel in the resized rgb image was produced from the original. Does that make sense?
Remember that labels cannot and should not be interpolated! Average of labels 1 and 3 is not the label 2. :-) You have to pick from 1 or 3 so while it isn't as smooth you've got to use an algorithm like nearest.
I can think of two sensible ways to handle this.
First approach is O(unique labels in image) and second approach is O(connected components).
@allanzelener Both nice ideas, especially the second one!
@allanzelener That's what I do, I rescale the polygons and then use OpenCV to draw the rescaled polygons. Works great and fast.
Here is my idea for a generate_samples_from_disk
API to replace ImageDataGenerator.flow_from_directory that should still be clear but now work for a wider cross-section of applications:
def generate_samples_from_disk(sample_sets, callbacks=load_image, batch_size=1, data_dirs=None):
"""Generate numpy arrays from files on disk in groups, such as single images or pairs of images.
# Arguments
sample_sets: A list of lists, each containing the data's filenames such as [['img1.jpg', 'img2.jpg'], ['label1.png', 'label2.png']].
Also supports a list of txt files, each containing the list of filenames in each set such as ['images.txt', 'labels.txt'].
If None, all images in the folders specified in data_dirs are loaded in lexicographic order.
callbacks: One callback that loads data from the specified file path into a numpy array, `load_image` by default.
Either a single callback should be specified or a callback must be provided for each sample set, and must be the same length as sample_sets.
data_dirs: Directory or list of directories to load.
Default None means each entry in sample_sets contains the full path to each file.
Specifying a directory means filenames sample_sets can be found in that directory.
Specifying a list of directories means each sample set is in that separate directory, and must be the same length as sample_sets.
batch_size: number of samples in a batch
# Returns
Yields batch_size data points in each list provided.
"""
To do that I believe the python unpack mechanism would be the thing to use, but otherwise the implementation shouldn't be too complicated. It should also be set up so it can work with PASCAL VOC easily and cleanly.
Example usage with layout as downloaded by https://github.com/fchollet/keras/pull/6665:
# pascal voc + berkeley semantic contours annotations
train_file_path = os.path.expanduser('~/.keras/datasets/VOC2012/combined_imageset_train.txt') #Data/VOClarge/VOC2012/ImageSets/Segmentation
val_file_path = os.path.expanduser('~/.keras/datasets/VOC2012/combined_imageset_val.txt')
data_dir = os.path.expanduser('~/.keras/datasets/VOC2012/VOCdevkit/VOC2012/JPEGImages')
label_dir = os.path.expanduser('~/.keras/datasets/VOC2012/combined_annotations')
def open_png(path):
path = path + '.png'
# ... open and return 1 channel uint8 numpy array ...
def open_jpg(path):
path = path + '.jpg'
# ... open and return 3 channel uint8 numpy array ...
seg_gen = generate_samples_from_disk([train_file_path, train_file_path],
callbacks=[open_jpg, open_png],
data_dirs=[data_dir, label_dir])
# now apply augmentation then fit
Any thoughts or details that are missing, perhaps how it would work with multiple input and label files per sample?
https://github.com/fchollet/keras/issues/6538#issuecomment-302180674 @allanzelener sounds like a nice approach, could you suggest an API design or have any reference code?
@allanzelener That's what I do, I rescale the polygons and then use OpenCV to draw the rescaled polygons. Works great and fast.
@Dref360 Do you have a link or is that private? I'm guessing OpenCV won't be permitted as a new dependency, there is a lot of baggage and dramatic version differences across OSes, and I haven't seen an API that's clean the way Keras is.
Okay it looks like dealing with sample_weight
and class_weight
shouldn't be too difficult to update for segmentation, the various training.py _standardize*() functions will need to be updated so they accept 2d (or more), so that means replacing functions like len()
with ones that go over each entry in the size/shape instead.
However, some indicator, member variable, or parameter may need to be carried so the difference between one_hot data and dense segmentation labels can be accounted for. Additional investigation needed on that front.
What about adding a parameter to all the relevant layers and other APIs which either:
class_dimensions
or channel_dimensions
which defaults to 1 where 2 would indicate 2D and so onclass_shape
or channel_shape
which explicitly specifies the shape of the data that should be operated on by operations like loss functionsThis could disambiguate the purpose of each data segment, and it could work in a manner analogous to a channels_first
and channels_last
that works on a per layer basis and could be inherited from previous layers by default, and specify which of the dimensions are class/channel dimensions. Thoughts?
Here is iteration 3.0 of this idea. I think this generalizes better to other non-segmentation problems. This is in addition to the extended segmentation data generator/augmentation, not instead of it. Comments are welcome!
What do you think of a data_spec
list parameter for Layer
which is essentially an improved (and local) image_data_format
to resolve data ambiguity?
2D classes with dense prediction vs depth in a 3D CNN with single class prediction.
data_spec
supported entries['height', 'width', 'channel', 'length', 'time', 'class', 'depth', 'category']
data_spec
dense_prediction_input = ['batch', 'height', 'width', 'channel']
dense_prediction_output = ['batch', 'height', 'width', 'class']
imagenet_prediction_channel = ['batch', 'height', 'width', 'channel']
imagenet_prediction_output = ['batch', 'class']
label_3d_input = ['batch', 'height', 'width', 'depth', 'channel']
label_3d_output = ['batch', 'height', 'width', 'depth', 'channel']
# Usage:
dense_prediction_input = ['batch', 'height', 'width', 'channel']
# stored internally as a configuration setting of the input
x_input = Input(data_spec=dense_prediction_input)
# rest of cnn here...
x_output = Dense(data_spec=dense_prediction_output)(x)
categorical_output = K.to_categorical(x_output)
# Automatically known:
# categorical_dense_prediction_output = ['batch', 'height', 'width', 'category', 'class']
K.categorical_crossentropy(categorical_output)
If a Layer has this spec, an implementation like categorical_crossentropy
can automatically reshape the data, run the algorithm correctly, then reshape it back to the original shape.
There might need to be input_data_spec
and output_data_spec
to handle the changes in dimension a layer might cause.
TBD: 'class' might instead be 'label' or 'target', and 'string' outputs could work too.
'device'
trivially extends this to GPUs
'host'
might extends this to distributed training
input_shape
exists and could be extended to accept
[(10,'batch'),(None, 'height'),(None,'width),(10,'classes)]
.
Backwards compatibility should be very achievable with each of these options! Just default to the current behavior if no data_spec
is supplied.
Hi,
I'm experimenting with Keras implementations of Yolo and SSD (https://github.com/lhk/object_detection). So far my code is very much just a toy project. But there is one feature that doesn't seem to be used so far:
For augmentation, the papers on object detection use variations of crops and color changes. I haven't seen the usual range of rotations/zooms/shifts so far. This is probably because you have to update the bounding box annotations to be kept in sync with the image.
I've implemented a basic prototype for automatic augmentation of images with bounding boxes: https://github.com/lhk/bbox_augmentations/blob/master/showcase.ipynb
This integrates nicely with Keras, I've actually used parts of your image preprocessing pipeline. Since this seems to be the issue for general discussion about API design in the direction of object detection / segmentation, I would like to propose this feature:
Reimplementing the current flow_from
functionality to work on images annotated with bounding boxes.
I would very much like to work on this. Could you point me in the right direction to get started ? How can I productively contribute to this ?
For example, I could try to recreate the current infrastructure of generators for the new annotated data type. Would that be useful ?
It would be an awesome addition. Algorithm-wise, zooms and shifts are straightforward, but the way you do rotations is wrong in principle. For example, if you rotate a circle around its center, the bounding box doesn't change, while with your approach it does change. Although for small rotations, and if bounding boxes weren't all that tight to begin with, it wouldn't matter. In this case, bounding boxes should be jittered anyway, and the sampling can take the original+rotated into account?
@lhk I'd suggest starting with the SegDataGenerator class in Keras-FCN, and create a pull request for the official keras-contrib repository that trains on pascal voc, a dataset already in keras-contrib. If you want to go that route you should also be aware of this PR which has some first steps (but also bugs in the example at the time of writing): https://github.com/farizrahman4u/keras-contrib/pull/152
To add some other resources:
Thanks! I've been slowly integrating some functionality into github.com/keras-team/keras-contrib as well, there are several open pull requests.
@ahundt, I am interested to help you in reinforcement learning with OpenAI gym. Please let me know, how should I proceed.
@luffy1996 this issue is about image segmentation rather than RL so I'll message separately.
@ahundt Check also https://github.com/bonlime/keras-deeplab-v3-plus
I'll close this issue for now since this thread didn't have any updates for quite a while. Please open another one if necessary.
Dense Prediction API Design, Including Segmentation and Fully Convolutional Networks
This issue is to develop an API design for dense prediction tasks such as Segmentation, which includes Fully Convolutional Networks (FCN), and was based on the discussion at https://github.com/fchollet/keras/pull/5228#issuecomment-299611150. The goal is to ensure Keras incorporates best practices by default for this sort of problem. Community input, volunteers, and implementations will be very welcome. #6655 is where preprocessing layers can be discussed.
Motivating Tasks and Datasets
Reference Materials
Feature Requests
These are ideas rather than a finalized proposal so input is welcome!
Existing Keras Utilities with compatible license
Questions