tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Support multi-task training #487

Closed rsepassi closed 5 years ago

rsepassi commented 6 years ago

v1.4.0 dropped support for multi-task training. This is a tracking issue to discuss how we may be able to add that support back.

Initial thinking is to have a MultiProblem subclass of Problem and a MultiModel subclass of T2TModel. MultiProblem would override input_fn to do the multiple inputs, and MultiModel would override estimator_model_fn to do the multiple models.

cbockman commented 6 years ago

@rsepassi we were a little surprised/saddened to see this, after we made our move to 1.4!

1) any feedback on why this was dropped?

2) any forecast on when this might return?--i.e., is this on near-term roadmap?

Trying to figure out if we should hack a home-grown solution to this, if we can somehow contribute to the formal development here, or if we should just wait--this capability was one of the several core reasons we jumped onto t2t in the first place!

rsepassi commented 6 years ago

Yeah it's always sad to lose good functionality like that. We decided however that it would be in the best interest of the project to prioritize getting the code into a cleaner state and MultiModel was the chief culprit for a lot of added complexity.

It's not on our immediate roadmap so we'd be very happy to work with you to contribute it back into t2t. My initial thoughts on design would be to create a MultiInput Problem overriding the input function and a MultiModel T2TModel overriding the model function. You can look back at older versions of the codebase to see what's necessary functionally. On Thu, Jan 18, 2018 at 6:08 PM cbockman notifications@github.com wrote:

@rsepassi https://github.com/rsepassi we were a little surprised/saddened to see this, after we made our move to 1.4!

1.

any feedback on why this was dropped? 2.

any forecast on when this might return?--i.e., is this on near-term roadmap?

Trying to figure out if we should hack a home-grown solution to this, if we can somehow contribute to the formal development here, or if we should just wait--this capability was one of the several core reasons we jumped onto t2t in the first place!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment-358844927, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGW9sAPVtv3gYMfE1hm6RJ26rt9CsCks5tL_kNgaJpZM4RLQO8 .

cbockman commented 6 years ago

Gotcha, really appreciate the quick response. Will circle back if we make any progress on this front.

On Thu, Jan 18, 2018 at 6:18 PM, Ryan Sepassi notifications@github.com wrote:

Yeah it's always sad to lose good functionality like that. We decided however that it would be in the best interest of the project to prioritize getting the code into a cleaner state and MultiModel was the chief culprit for a lot of added complexity.

It's not on our immediate roadmap so we'd be very happy to work with you to contribute it back into t2t. My initial thoughts on design would be to create a MultiInput Problem overriding the input function and a MultiModel T2TModel overriding the model function. You can look back at older versions of the codebase to see what's necessary functionally. On Thu, Jan 18, 2018 at 6:08 PM cbockman notifications@github.com wrote:

@rsepassi https://github.com/rsepassi we were a little surprised/saddened to see this, after we made our move to 1.4!

1.

any feedback on why this was dropped? 2.

any forecast on when this might return?--i.e., is this on near-term roadmap?

Trying to figure out if we should hack a home-grown solution to this, if we can somehow contribute to the formal development here, or if we should just wait--this capability was one of the several core reasons we jumped onto t2t in the first place!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment- 358844927, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABEGW9sAPVtv3gYMfE1hm6RJ26rt9CsCks5tL_kNgaJpZM4RLQO8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment-358846497, or mute the thread https://github.com/notifications/unsubscribe-auth/AEc6EvnIgAPayCTH9XlQPNt8OIE4wwSSks5tL_tsgaJpZM4RLQO8 .

agemagician commented 6 years ago

@rsepassi Any update of the multi-task integration again rather than using the old t2t versions (v1.2.1)? Do you have any time plan since it is almost a year since the drop ?

rsepassi commented 6 years ago

we have no plans currently.

i think the general design of how one might do this still stands: MultiInputProblem that returns from its input_fn batched Tensors that roundrobin (or other weighting) several datasets. A T2TModel that overrides bottom, top, and loss to deal with the conditional execution of different entry and exit flows. And possibly overriding some of the eval metrics logic so you only conditionally execute the eval metrics on a batch. Which can be done in the estimator_spec_eval fn On Mon, May 28, 2018 at 6:26 AM Ahmed Elnaggar notifications@github.com wrote:

@rsepassi https://github.com/rsepassi Any update for the multi-task integration again rather than using the old t2t versions (v1.2.1)? Do you have any time plan since it is almost a year since the drop ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment-392526726, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGW0D1FMzLVjINA43pRfnptGDoqmkAks5t2_sWgaJpZM4RLQO8 .

estathop commented 6 years ago

so it's still dropped right ? Is it too difficult to tweak it ?

rsepassi commented 6 years ago

There are now a couple groups internally interested in adding support for multi-task training. It may be a quarter or so before it comes out, but wanted to update for those generally interested.

cbockman commented 6 years ago

Great news!

On Fri, Jul 13, 2018 at 12:10 PM, Ryan Sepassi notifications@github.com wrote:

There are now a couple groups internally interested in adding support for multi-task training. It may be a quarter or so before it comes out, but wanted to update for those generally interested.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment-404926325, or mute the thread https://github.com/notifications/unsubscribe-auth/AEc6EpvgVFvBMg3Da8_ifJAr3r0QL7z5ks5uGPCOgaJpZM4RLQO8 .

agemagician commented 6 years ago

Hi @rsepassi ,

We also started to add the support for multi-task training back again. I think we will finish on 2-3 months.

Which option you think is better: 1- Stop since you are already working on it. 2- Sharing our development progress here together in this thread. 3- Work together on adding it back.

cbockman commented 6 years ago

I'll add onto the above--we have an internal implementation which has a number of limitations, but:

1) potentially help contribute writ large, if there was a good way to do so 2) we have a very specific stand-alone component that might be of help (depending on how the rest of the code is formatted...). Specifically, given a trained multi-task model, we have code that will do the network surgery and remove the "irrelevant" portions.

E.g., if you are multi-task training an NLI task with a sentiment task, and only care about sentiment, then we have a post-processor (similar to https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/bin/t2t_avg_all.py) which will cut away the NLI portion afterwards (for ease/speed/reduced mem requirements of applying it in inference, largely).

On Fri, Jul 13, 2018 at 1:41 PM, Ahmed Elnaggar notifications@github.com wrote:

Hi @rsepassi https://github.com/rsepassi ,

We also started to add the support for multi-task training back again. I think we will finish on 2-3 months.

Which option you think is better: 1- Stop since you are already working on it. 2- Sharing our development progress here together in this thread. 3- Work together on adding it back.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/487#issuecomment-404947993, or mute the thread https://github.com/notifications/unsubscribe-auth/AEc6EsEmU5VvL8yPTBaGcyizQF3eYJGtks5uGQXsgaJpZM4RLQO8 .

vergilus commented 6 years ago

Looking forward to it. I'm also trying to code my own personal multi-task training.

JaysonAlbert commented 5 years ago

It's been a couple of months, is this feature available now?

lukaszkaiser commented 5 years ago

Yes: Just train on one of the MultiProblem classes, for example with: --problem=languagemodel_multi_wiki_translate --model=transformer --hparams_set=transformer_tall_pretrain_lm_tpu_adafactor_large

We have pre-trained checkpoints with a 10-problem multi-model here: gs://tensor2tensor-checkpoints/transformer_multi_2jan19/

You can translate English to German with this model (after copying it down) like this: t2t_decoder --problem=languagemodel_multi_wiki_translate --model=transformer --hparams_set=transformer_tall_pretrain_lm_tpu_adafactor_large --decode_hparams='batch_size=1,multiproblem_task_id=64510' --hparams="" --output_dir ~/t2t_train/transformer_multi_2jan19 --decode_from_file ~/newstest2014.en

This does English-German, for other tasks and directions just change the multiproblem_task_id, e.g., 64511 is English-French, 64512 is English-Romanian, and the other tasks are in order as listed here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/wiki_multi_problems.py#L135

cwbeitel commented 5 years ago

See also https://github.com/tensorflow/tensor2tensor/blob/master/docs/multi_problem.md