tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

Formalize process for code merging into tf.image, tfa.image, and keras-preprocessing #1780

Closed seanpmorgan closed 4 years ago

seanpmorgan commented 4 years ago

Currently there is no communication or criteria for what is being merged into the different repositories. This creates duplicated code within the ecosystem and is wasting contributor time/effort.

As an example, we've tried to extend ImageProjectiveTransformV2 without realizing that TF-core had already implemented the same functionality: https://github.com/tensorflow/tensorflow/commit/26992815bf7ee136c49f658596a9fbde7881d425

@tanzhenyu Could you give us some clarification on what the internal process is for adding components to tf.image? Can we publish a roadmap of features that are planned to be added and can we make verifying against Addons part of the process for adding new things going forward?

@dynamicwebpaige Could you help clarify whats being contributed to Keras in this GSoC project: https://summerofcode.withgoogle.com/projects/#4863446367076352

The auto-augment issue caused us to receive a lot of PRs implementing functionality into Addons which looks as though it may now be going directly to Keras.

CC @karmel for visibility and see if you had any ideas for how we can formalize this process.

Related issues (To be expanded upon because there are many):

1779

1126

1275

bhack commented 4 years ago

From GSOC project desc:

Recently, TensorFlow has introduced the Preprocessing Layers API that allows preprocessing layers to be serialized with the model itself.

What will happen if users will start train models and serialize overlapping ops in Addons that are or will appear soon in other places (developed from scratch or ported from google AutoAugmentation repos)?

Need we to maintain these overlapping ops in Addons for Models retrocompatibility?

Need we to stop to process image ops PR by now?

What about morphology custom ops that we have in the repo?

seanpmorgan commented 4 years ago

What will happen if users will start train models and serialize overlapping ops in Addons that are or will appear soon in other places (developed from scratch or ported from google AutoAugmentation repos)?

Need we to maintain these overlapping ops in Addons for Models retrocompatibility?

Need we to stop to process image ops PR by now?

What about morphology custom ops that we have in the repo?

This is a good topic so want to make sure we fully cover it... does the below information cover the issues you're seeing?

When a pre-processing layer is serialized it will contain package information so that the correct layer is re-loaded upon de-serialization: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/layers/preprocessing/image_preprocessing.py#L106

For example a TFA preprocessing layer would be marked as pakage Addons: https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/netvlad.py#L22

If there were underlying custom-ops being called by that python layer those too would have package information: https://github.com/tensorflow/addons/blob/master/tensorflow_addons/custom_ops/image/cc/ops/image_ops.cc#L132 The ops themselves are not serialized but rather the api endpoint.

Our worst case for compatibility would be that a custom-op gets called from a preprocessing layer and the exact version of TFA may need to be used upon re-loading. But then we already don't provide backwards compatibility for custom-ops: https://github.com/tensorflow/addons#c-custom-op-compatibility

If the preprocessing layer were to reference a python op from addons that had been migrated then it should alias the new location of that python op (tf-core most likely).

bhack commented 4 years ago

If the preprocessing layer were to reference a python op from addons that had been migrated then it should alias the new location of that python op (tf-core most likely).

I suppose that part of the @gabrieldemarmiesse comment at https://github.com/tensorflow/addons/issues/1779#issuecomment-623638831 Is valid also for non custom ops. The risk is lower or at least under control with upstreaming ops from Addons but there could be problem when creating an alias on semi overlapping ops that we are not aware about its API design (missing parameters, different implementation or approximation that mismatch the output, etc.). I think that for these kind of problems it could be quite hard to be handled specially if we have not upstreaming operators supereseeded or semi-superseed (API mismatch) somewhere at distance of few months.

EDIT: some points touched in this ticket was also confusing for students as you can see at https://github.com/tensorflow/tensorflow/issues/37274

tanzhenyu commented 4 years ago

Currently there is no communication or criteria for what is being merged into the different repositories. This creates duplicated code within the ecosystem and is wasting contributor time/effort.

As an example, we've tried to extend ImageProjectiveTransformV2 without realizing that TF-core had already implemented the same functionality: tensorflow/tensorflow@2699281

@tanzhenyu Could you give us some clarification on what the internal process is for adding components to tf.image? Can we publish a roadmap of features that are planned to be added and can we make verifying against Addons part of the process for adding new things going forward?

@dynamicwebpaige Could you help clarify whats being contributed to Keras in this GSoC project: https://summerofcode.withgoogle.com/projects/#4863446367076352

The auto-augment issue caused us to receive a lot of PRs implementing functionality into Addons which looks as though it may now be going directly to Keras.

CC @karmel for visibility and see if you had any ideas for how we can formalize this process.

Related issues (To be expanded upon because there are many):

1779

1126

1275

Sorry @seanpmorgan @bhack , apparently I missed this comment!

Let me try to answer your question in several perspectives: 1) (I think) TFA is the default place to start experimenting ops/layers. When new algorithms come up and many people would like to use it / contribute it with TF ecosystem, we'd always recommend TFA as a starting point. Whether TFA decides to accept this or not is completely up to community leads, i.e., mainly you (and owners of tfa.image in this case). When this becomes widely used, we need sponsorship from inside TF to a) migrate to core, b) maintain it for a long time. 2) In this particular case of ImageProjectiveTransformV2 op, preprocessing layers (that will live in core) relies on it and I decided I will be sponsoring this in core, hence the commit (and also mentioned that we need to deprecated in commit message). What could have been done better is I should probably raise an issue right after this is committed (it slipped my mind apparently). But a sync would mean someone that is very up-to-date with both core and TFA, AFAIK there isn't a formal process, what I would propose is let's ask TF for some sponsorship in each area, i.e., have someone from TF that attends your monthly meetings on image related, loss/metrics related, text related, optimizer related, etc -- that can be a single person, or several of them. Let's discuss this. 3) Because of 2), things need to be discussed case-by-case. For basic image ops like translate and rotate, they should exist in core. For auto-augment and rand-augment, if I'm not wrong (correct me if I am) it's mostly used in EfficientNet and maybe transferrable to other models, and I'd like to see them live in TFA (at least for a while) -- they are preprocessing for sure, but it wouldn't show up in tf.keras.layers.preprocessing before this is proven to be a general technique for improving accuracy.

Cheers,

bhack commented 4 years ago
  1. (I think) TFA is the default place to start experimenting ops/layers. When new algorithms come up and many people would like to use it / contribute it with TF ecosystem, we'd always recommend TFA as a starting point. Whether TFA decides to accept this or not is completely up to community leads, i.e., mainly you (and owners of tfa.image in this case). When this becomes widely used, we need sponsorship from inside TF to a) migrate to core, b) maintain it for a long time.
  2. In this particular case of ImageProjectiveTransformV2 op, preprocessing layers (that will live in core) relies on it and I decided I will be sponsoring this in core, hence the commit (and also mentioned that we need to deprecated in commit message). What could have been done better is I should probably raise an issue right after this is committed (it slipped my mind apparently). But a sync would mean someone that is very up-to-date with both core and TFA, AFAIK there isn't a formal process, what I would propose is let's ask TF for some sponsorship in each area, i.e., have someone from TF that attends your monthly meetings on image related, loss/metrics related, text related, optimizer related, etc -- that can be a single person, or several of them. Let's discuss this.

We are trying to improve the process at: https://github.com/tensorflow/community/pull/241 (upstreaming) https://github.com/tensorflow/community/issues/242 (downstreaming)

I think we could solve 1. and 2. if we find a common vision in these templates.

On the TF side I think you just need to add, to the internal and public PR/review triage checklist an extra check to verify if the code is already in SIGs.

The main issue about 3 is that if we don't have a minimal short range roadmap about ops planned in keras preprocessing by TF team developing activities and Gsoc student activities we are at risk to have PR here in addons by contributors that could conflict/duplicate/overlap.

So I think nobody here want to waste free contributors time and so the best solution is to have a minimal overview of the plan in this area.

E.g. If we know that the TF team is not going to add new image operators and we have a public list of ops in the Gsoc roadmap it could be clear to potential contributors on what kind of image ops PR we are looking for here in Addons.

tanzhenyu commented 4 years ago
  1. (I think) TFA is the default place to start experimenting ops/layers. When new algorithms come up and many people would like to use it / contribute it with TF ecosystem, we'd always recommend TFA as a starting point. Whether TFA decides to accept this or not is completely up to community leads, i.e., mainly you (and owners of tfa.image in this case). When this becomes widely used, we need sponsorship from inside TF to a) migrate to core, b) maintain it for a long time.
  2. In this particular case of ImageProjectiveTransformV2 op, preprocessing layers (that will live in core) relies on it and I decided I will be sponsoring this in core, hence the commit (and also mentioned that we need to deprecated in commit message). What could have been done better is I should probably raise an issue right after this is committed (it slipped my mind apparently). But a sync would mean someone that is very up-to-date with both core and TFA, AFAIK there isn't a formal process, what I would propose is let's ask TF for some sponsorship in each area, i.e., have someone from TF that attends your monthly meetings on image related, loss/metrics related, text related, optimizer related, etc -- that can be a single person, or several of them. Let's discuss this.

We are trying to improve the process at: tensorflow/community#241 (upstreaming) tensorflow/community#242 (downstreaming)

I think we could solve 1. and 2. if we find a common vision in these templates.

On the TF side I think you just need to add, to the internal and public PR/review triage checklist an extra check to verify if the code is already in SIGs.

The main issue about 3 is that if we don't have a minimal short range roadmap about ops planned in keras preprocessing by TF team developing activities and Gsoc student activities we are at risk to have PR here in addons about contributors that could conflict/duplicate/overlap.

So I think nobody here want to waste free contributors time and so the best solution is to have a minimal overview of the plan in this area.

E.g. If we know that the TF team is not going to add new image operators and we have a public list of ops in the Gsoc roadmap it could be clear to potential contributors on what kind of image ops PR we are looking for here in Addons.

We don't have any short-term plans to add ops. For keras preprocessing, they are described here and already implemented. What AutoAugment requires (such as solarize, sample pairing) will not be made into core. Francois and I are discussing the roadmap for keras_image which might include some of these.

bhack commented 4 years ago

So where the Gsoc student is going to create PRs? On keras_image?

bhack commented 4 years ago

Francois and I are discussing the roadmap for keras_image which might include some of these.

As we have now Keras_cv what is the roadmap? /cc @WindQAQ

tanzhenyu commented 4 years ago

Francois and I are discussing the roadmap for keras_image which might include some of these.

As we have now Keras_cv what is the roadmap? /cc @WindQAQ

Currently we will mostly cover object detection models, and will expand for segmentation / keypoint estimation tasks when it's done.

bhack commented 4 years ago

@tanzhenyu Thank you for the update but it is a little bit unclear for me. Are vision models already under refactoring in the model Garden Vision section and related TF hub activities? So what we will have in keras-cv?

tanzhenyu commented 4 years ago

@tanzhenyu Thank you for the update but it is a little bit unclear for me. Are vision models already under refactoring in the model Garden Vision section and related TF hub activities? So what we will have in keras-cv?

We'd believe it's best to have each repo do its own thing and do it really well. So Model Garden (IMHO) should handle user facing end-to-end modeling workflow that can quickly get deployed, while Keras-CV should handle low level research needs that can build models with reuse-able components. At the end of day, model garden should re-use Keras-CV components to build their models. And Model Garden is too large today, it's quite easy to confuse users between research vs. official, recommendation v.s. vision v.s. nlp, colab v.s. modeling. v1 v.s. v2. Having a lightweight repo that is specialized on vision related tasks and built on top of TF 2 (or possibly just Keras instead) is what we're trying to achieve.

bhack commented 4 years ago

We'd believe it's best to have each repo do its own thing and do it really well. So Model Garden (IMHO) should handle user facing end-to-end modeling workflow that can quickly get deployed, while Keras-CV should handle low level research needs that can build models with reuse-able components.

OK this part is clear and it is what I hope to have reuse-able components not embedded every time in the mode itself (model garden).

And Model Garden is too large today, it's quite easy to confuse users between research vs. official, recommendation v.s. vision v.s. nlp, colab v.s. modeling. v1 v.s. v2. Having a lightweight repo that is specialized on vision related tasks and built on top of TF 2 (or possibly just Keras instead) is what we're trying to achieve.

I'm not so sure about this second part cause it could go to confuse users and conflict with the multi storage source of models. I've already seen in past many ambiguous entries between TPU repo, tensorflow Garden, Coral, tensorflow lite and so on on model storage/reference.

If you start to store overlapped models in Keras and in other places you risk to confusing people a lot.

tanzhenyu commented 4 years ago

I'm not so sure about this second part cause it could go to confuse users and conflict with the multi storage source of models. I've already seen in past many ambiguous entries between TPU repo, tensorflow Garden, Coral, tensorflow lite and so on on model storage/reference.

If you start to store overlapped models in Keras and in other places you risk to confusing people a lot.

We don't intend to have overlapped models (in Keras-CV), and frankly having many models in different places is exactly what is confusing users today, I think?

bhack commented 4 years ago

and frankly having many models in different places is exactly what is confusing users today, I think?

Is it a question? If it is a question for me yes was on of the confusing topics. At least the scope of TF.hub was to masquerade these multiple sources to have "reusable parts of machine learning models" and Keras support was one of the first feature requests in TF.hub when was launched (https://github.com/tensorflow/hub/issues/13).

tanzhenyu commented 4 years ago

and frankly having many models in different places is exactly what is confusing users today, I think?

Is it a question? If it is a question for me yes was on of the confusing topics. At least the scope of TF.hub was to masquerade these multiple sources to have "reusable parts of machine learning models" and Keras support was one of the first feature requests in TF.hub when was launched (tensorflow/hub#13).

You clearly understand our plans well :-) Yeah but we cannot have every magical trained (hub) module as a single Keras layer, for cases when a certain layer inside the module needs to be inspected / updated. For it to act as a single glob (like in transfer learning) yes that is what TF Hub was built for (IMHO).

bhack commented 4 years ago

@tanzhenyu Yes I was talking about transfer learning but then you need to reuse that utils and stuff but generally many of these are embedded in the mode code itself. So this was the main issue, all the preprocessing, postprocessing and glue code was not reusable in transfer learning or "model forks".

tanzhenyu commented 4 years ago

@tanzhenyu Yes I was talking about transfer learning but then you need to reuse that utils and stuff but generally many of these are embedded in the mode code itself. So this was the main issue, all the preprocessing, postprocessing and glue code was not reusable in transfer learning or "model forks".

Sorry that I don't follow -- can you elaborate the "main issue" part?

bhack commented 4 years ago

That other than TF hub common interface for trasfer learning users what to reuse model components that generally are embedded in the model itself.

So I hope that if we have a new model like e.g Efficientnet we can use its augmentation as a general components also for other model designs and that we have one reliable model soruce instead of Keras.application, TPU repository, automl repository, model garden then keras-cv and so on.

tanzhenyu commented 4 years ago

That other than TF hub common interface for trasfer learning users what to reuse model components that generally are embedded in the model itself.

So I hope that if we have a new model like e.g Efficientnet we can use its augmentation as a general components also for other model designs and that we have one reliable model soruce instead of Keras.application, TPU repository, automl repository, model garden then keras-cv and so on.

IIUC -- as of today Keras-application is still the only one that provides a full suite of backbones to be used for image classification tasks which doesn't have the issue you mentioned

bhack commented 4 years ago

I guess it doesn't matter too much where it worked and where not. What is important is that it works as a whole in the ecosystem for what we offer/expose to users. I hope we are not going to confict or creating too much duplicates with the model request/contribution policy that we already expose now for Model garden: https://github.com/tensorflow/models/wiki/How-to-contribution.

As also for the reusable components part of the topic my position Is at https://github.com/tensorflow/community/issues/223 (you can substitute Addons target with any SIG).

I think it is in a way a collective effort of the whole ecosystem to not expose too much teams fragmentation (every group Is working on its own stuffs) to the end users.

seanpmorgan commented 4 years ago

Per https://github.com/tensorflow/addons/pull/2156 migrations will now be handled by core TF and Keras team members to make the migration process more managable.