keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1k stars 333 forks source link

ConvNeXt V2 #1233

Open DavidLandup0 opened 1 year ago

DavidLandup0 commented 1 year ago

Short Description Just released - ConvNeXt with a new internal layer.

Papers https://arxiv.org/abs/2301.00808

Existing Implementations https://github.com/facebookresearch/ConvNeXt-V2

Other Information If accepted, sign me up for the PR!

bhack commented 1 year ago

@tanzhenyu What is our (sustainability) policy about brand new papers?

bhack commented 1 year ago

We had a quite long discussion with @LukeWood and @innat at https://github.com/keras-team/keras-cv/discussions/52#discussioncomment-2058663

DavidLandup0 commented 1 year ago

Thanks for linking the discussion! My two cents are: if an arch can be useful for end-users, we should consider it based on how modular it can be (big factor for KCV), how reproducible it is (simple vs difficult training pipeline, big or small models, official public repo or not, etc.), how widely accepted it is (how many implementations and usages over the world), and how useful they could be for end-users.

A computer vision library dedicated for auto-driving, robotics and on device applications.

While the vision may change, it'll conceivably stay under the path of "bringing CV to production". If a model is aligned with it and shows clear advances, it should be added IMO.

With brand-new archs, it's hard to test whether they show clear advances without further peer review and usage but, we already have ConvNeXt in both keras.applications and keras_cv.models, which validates the structure and usefulness for ConvNeXtV2.

Also, the acceptance criteria might be different between backbones and narrow architectures - backbones are generally useful. Not all backbones improve downstream tasks though. IMO, if a backbone reports improved downstream task performance, we should give it more weight.

bhack commented 1 year ago

I think that especially for a backbone the test of time is how many papers are using the specific backbone (in this specific case it was just published few days ago).

Then IMHO we have a sustainably open topic over accumulating components as we still don't have a clear codeownership of the library/modules and we are missing its relative MIA handling (https://github.com/keras-team/keras-cv/discussions/1184 and https://github.com/keras-team/keras-cv/discussions/950#discussioncomment-3926423). If we rely only on the few internal team members I suppose that we will have a maintainership boundary/bottleneck soon or later as in any process working with quite limited resources.

Then I agree that just porting reference weights it is less time consuming and risky but we have still not clarified our aim to train or not from scratch also backbones and eventually what is our opinionated position on eventually releasing weights with a "starting gap" on downstream task when we try to train backbones from scratch.

I think in production, when working on downstream tasks, many users are interested to start from top performing pre-trained perfs. So sometimes, with limited resources, I prefer downstream reproducibility of our training scripts from well known performing weights (this is still quite confusing https://github.com/keras-team/keras-cv/issues/495).

On the other side as I have seen in many PRs our training process is still quite "artisanal" (https://github.com/keras-team/keras-cv/discussions/954) and also it is often quite hard to be on the same page about the reproducibility process. Also my impression is that we still need to target just free resources (colab) if we want pre-CI training proxy check on the contributor side to not create a contribution wall for the requested hw resources.

/cc @tanzhenyu @LukeWood @ianstenbit @martin-gorner

tanzhenyu commented 1 year ago

52 (reply in thread)

Thanks @DavidLandup0 @bhack for the discussion. There's no golden threshold here, but there's a core value we need to stick to, which is "KCV is for production and applied ML". So below are a couple of factors:

  1. yeah it's widely cited
  2. it's a general improvement to many tasks, not just a special trick that boost the performance in one particular task
  3. many users request one model for a special scenario that matches our vision -- for example, if MobileNet is quite important in mobile models but less so much in other cases, we still want to include it because it matches our vision
  4. it fits naturally with our existing API, especially in terms of input and output, because we want our components to be modular
  5. it's reproducible, i.e., training from scratch.
  6. it gives users a good balance between latency and quality, for example, transformer models.

The list can go on and we don't have immediate plans to write down what should be the standard, but we try to be agile and answer questions such as "should we include XXX model".

As for this model, I think it's ok to include it, but not prioritized at this moment

DavidLandup0 commented 1 year ago

I think in production, when working on downstream tasks, many users are interested to start from top performing pre-trained perfs.

Agreed. Many production use-cases boil down to replacing a backbone with a slightly better one. This includes Kaggle competitions, which can IMO benefit a lot from KCV. For tricky-to-train models, like ViTs and ConvNeXt - we might want to focus on serving them primarily for downstream tasks (fine-tuning, segmentation, object detection, etc.).

We could then separate "trainable" and "tricky-trainable" architectures, where we produce the weights with public scripts here for the former, but port for the latter.

I agree with @tanzhenyu as well that it might be tricky to create a list and check whether a model fits criteria. Maybe sometime down the line, we make a list of say, 10 bullet points, and accept an arch if it ticks 6+ of those, for example. For now at least, while there's still lots of ground left to cover, we can probably sensibly make decisions on the fly?

bhack commented 1 year ago

Yes I think that my points are general enough end not related to an "algorithm" for the inclusion.

My points are more focused about the general susteinability of the library (codeownership) and to not increase too much the contribution barrier about hw resources (devinfra/CI).

So I think these points could be still handled in the relative tickets/discussions:

But I have already mentioned the relative tickets/discussions for all these points so I don't think we need to discuss these here if we could make some progress in the relative threads in 2023.