Why is ResNet fixed to only two datasets?

tensorflow / swift-models

Models and examples built with Swift for TensorFlow

Apache License 2.0

647 stars 147 forks source link

Why is ResNet fixed to only two datasets? #253

Closed rickwierenga closed 4 years ago

rickwierenga commented 4 years ago

It seems that you can only train a ResNet on these two datasets:

public enum DataKind {
    case cifar
    case imagenet
}

Without any useful customization options this seems unusable.

I think DataKind should have associated values which can be passed to the initializer to make it more flexible.

saeta commented 4 years ago

I think it's not that you can only train on those 2 datasets, but rather that there are 2 variations of ResNet's, one that's optimized for CIFAR-sized images, and one that's optimized for bigger ImageNet images. Does that make senes? (I agree that this is a little confusing, and that a comment providing context would be very welcome as a PR!)

brettkoonce commented 4 years ago

This is some old code from when there were only a few models in the repo that needs to be rethought. Loosely, there are two problems this was trying to solve (see #156):

1) setting up the input filters for the network (non-obvious) 2) setting the number of output nodes (trivial)

(2) is handled better IMO by having an explicit classCount parameter, which is how the latest models work, so we should consider moving to that approach. see also: #251 (1) once we get to more complicated cv models, using a enum param name of some form is useful because many networks have multiple combined settings (eg https://github.com/tensorflow/tpu/blob/d06f0af0e48bb5e815b4db48c2d1e7168066064a/models/official/efficientnet/efficientnet_builder.py#L37)

Another route we could do is explicitly split the imagenet and cifar models into separate subtypes of imageclassification, remove the cifar-specific logic from the current resnet variants, and then provide the resnet20/(+N*12) variants again for cifar.

BradLarson commented 4 years ago

I think Brett and Brennan provide the right context here. The ResNet model in the repository was originally designed to test that the framework could train a model of that complexity end-to-end. The published ResNet structure was designed against the ImageNet dataset, but we lacked the pipeline to pull in and train against ImageNet, so a modified version was used that could be run against CIFAR10 (a Python TensorFlow implementation for this, as a reference).

The big difference isn't the dataset, it's the input image size. For CIFAR10, that's 32x32, where they used a 224x224 input image size for the published model trained against ImageNet. Appropriate downsampling has to occur in the first few layers to get the tensor shapes right the rest of the way through the network, thus the enum. You can see that here, where for ImageNet-sized images they are immediately downsampled by a stride of 2, another stride of 2, and then the final result of the network are 7x7 patches that are averaged. For the smaller CIFAR10 images, the image isn't downsampled in the first stages, and it results in 4x4 patches to average.

As Brett said, we should make classCount an input parameter to allow for more flexibility in datasets, and we could possibly replace the hardcoded average size with a global averaging operation. Most image classification datasets can be downsampled to ImageNet-sized images (and usually are), so that would provide broad compatibility with image classification datasets. We'll still need to have a switch for the downsampling provided in the early layers, but we certainly can be clearer in the parameter naming here.

Organizing and updating the model examples and tests is an ongoing process, so thanks for bringing this up.

rickwierenga commented 4 years ago

Thanks! Can I close the issue because my question is answered or would you like to keep it open until it's fixed?

8bitmp3 commented 4 years ago

Thanks for asking @rickwierenga and thanks for your explanations @saeta @BradLarson @brettkoonce. I am also going through the ResNet "v1" example and noticed this earlier. More flexibility in datasets would be awesome in the long run. The ResNet code is very clean, thanks @marcrasi @BradLarson and the team.

In case someone reading this thread is a bit confused, in the "main" ResNet model struct in ResNet50.swift under Models in the /tensorflow/swift-models/ repo, you'll notice that for each dataset image input size (there is a DataKind enum with two dataset options), there is a switch-case for l1 ("layer 1")—the Conv2D with a batch norm defined in a separate ConvBN struct:

public struct ResNet: Layer {
    public var l1: ConvBN // A conv2d with a batch norm
    public var maxPool: MaxPool2D<Float>

    public var l2a = ...
    ...
    public init(dataKind: DataKind, layerBlockCounts: (Int, Int, Int, Int)) {
        switch dataKind {
        case .imagenet: // if ImageNet images are used as input in ConvBN
            l1 = ConvBN(filterShape: (7, 7, 3, 64), strides: (2, 2), padding: .same)
            maxPool = MaxPool2D(poolSize: (3, 3), strides: (2, 2))
            avgPool = AvgPool2D(poolSize: (7, 7), strides: (7, 7))
            classifier = Dense(inputSize: 2048, outputSize: 1000)
        case .cifar: // if CIFAR-10 images are used as input in ConvBN
            l1 = ConvBN(filterShape: (3, 3, 3, 64), padding: .same)
            maxPool = MaxPool2D(poolSize: (1, 1), strides: (1, 1))  // no-op
            avgPool = AvgPool2D(poolSize: (4, 4), strides: (4, 4))
            classifier = Dense(inputSize: 2048, outputSize: 10)
        }

Source: https://github.com/tensorflow/swift-models/blob/master/Models/ImageClassification/ResNet50.swift

rickwierenga commented 4 years ago

Thanks for the additional explanation @8bitmp3.

@BradLarson, do you have any update on making this struct generic? Here is the Keras reference: https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py.

Apart from this particular issue, what is the key purpose of this repository? I'm a little confused as to whether it's supposed to be a model garden or dataset repository. It might be a good idea, in the long run, to split it up in two repos (I'm thinking SwiftData and SwiftApplications) creating a designated repository focusing on data loading. We might try to implement lazy loading in order to support larger datasets. I have contributed to tensorflow/datasets as a Code In task and quite liked the way it's organized and think Swift for TensorFlow would benefit from having something similar.

I would be glad to help implement some of this.

cc @saeta @dynamicwebpaige

BradLarson commented 4 years ago

@rickwierenga - A little while ago, I put together this document to describe some of the goals of the repository and some near-term work that we'd be doing within it. There were four goals I listed there:

Implement testing infrastructure for these models.
Provide clear examples for how to use Swift for TensorFlow to construct models, instantiate, train, and perform inference.
Provide a library of common models of many types.
Make it easy for developers to be able to import a module and instantiate common models.

I've reordered those above in roughly descending priority. Our first and foremost goal in the next few months is to build up robust testing and benchmarking for Swift for TensorFlow so that we can be confident in the correctness and be able to measure the performance of automatic differentiation, whatever backend we are using, and higher-level APIs built on that. For that, we want sufficient coverage of use cases to be able to thoroughly test things from the bottom up.

Beyond that, the repository functions as a great resource for seeing how to work with Swift for TensorFlow, and we want it to continue to ease adoption of the language and frameworks. That includes providing some reusable components that could help build models quickly.

The "datasets" that we have in the repository are more like wrappers around the actual datasets, and right now they provide the minimal functionality needed to feed data into the (largely image classificaion) models we use for testing. The ones that have been set up in here were chosen for their ability to exercise common network architectures.

I agree, there's a lot to like about the tensorflow/datasets library, and we've had a number of conversations with that team. We'll be talking more about the dataset design soon, but for right now these exist in the service of the models in the repository, which is why they remain here.

We plan to use this repository as a staging ground to experiment with concepts that might work their way into the core API, and the datasets could spin off from that as well.

saeta commented 4 years ago

@rickwierenga I entirely agree with @BradLarson and think that this repository should have multiple libraries within it, some focused on data loading, some pre-built models, and some that depend on both to make higher-level abstractions and tools (and of course to do integration testing too). I really like https://github.com/tensorflow/datasets and think a Swift-flavored variation of that would fit in very well in this (swift-models) repository. (Concretely, having fewer repositories means it's easier to see how things fit together because there can be a sample program that combines a pre-built model architecture with a pre-built dataset together into an example program.)

In terms of the software stack & architecture, I see a few major layers:

At the "bottom" is extremely broadly applicable concepts, such as automatic differentiation. This is done more at the "language"-level than library level, but it has a number of important library components. Other things that belong here include our metaprogramming explorations, as well as things like KeyPathIterable, etc.
The next more specific abstraction is a high performance tensor library. This was the motivating work behind Layering Swift APIs. Also at this level is generic parallelism abstractions, as well as some other goodies, such as XLA.
The next level above that are "small" deep-learning-specific building blocks, such as layers and optimizers.
Building on all of the previous levels include models and pre-built input pipelines.
I think there's space for even higher-levels of abstraction here, such as a pre-built image processing toolkit.
Finally at the top are the end-to-end applications (that build on either level 4 or level 5).

In general, the way that development has worked well so far is for new code / features to generally start right next to the application. This allows everyone to iterate on the design and see it in context. Then over time it "bubbles up" to the right level in the stack as we determine how broadly applicable it is. This repository is level's 4, 5, and 6, swift-apis is levels 2 and 3, and finally the core swift repository is level 1.

Hope that helps!

rickwierenga commented 4 years ago

I see, thank you! By the way, I have made two comments on the doc Brad shared.

BradLarson commented 4 years ago

With PR #275, I think the remaining issues discussed here have been addressed, so I'll close this out. Thanks for the good discussion, everyone.