zhoutong-fu commented 4 years ago

Description

Detext clients have asked for multitask learning on ranking tasks so that the model can benefit from different subtasks. After reviewing surveys of multitask learning literature, we've decided to add the basic yet very popular hard-parameter sharing neural network structures to Detext framework.

This implementation assumes that all subtasks share the same deep features while wide features and labels are different, which suggests that both MLP and LTR layers are task-specific and other layers are shared across subtasks. The implementation also assumes individual task losses to be optimized separately and each training record only contains features/labels for one subtask. For training records with more than one sub-tasks, please split them into multiple records.

In summary, this implementation supports multitask training when

all deep features are shared across all subtasks while wide features are not (task-specific)
all subtasks are ranking tasks and use the same loss function
each training record contains features/labels for one of the subtasks

This implementation does NOT support

mixing ranking tasks with classification tasks
sharing a proportion of deep features
training record with more than one subtasks (please split them into multiple records)

How to run multitask training?

Add task_id in the training data as a list of int64. The list should contain only one entry, representing the single subtask for the underlying training record.
Concatenate wide features of all subtasks into a list and pads 0s for those unrelated to the underlying subtask. For example, suppose task 1 has 5 wide features and task 2 has 10 wide features, then the final wide feature is a list of 15 entries with the first 5 padded with 0s for task2 data records and the last 10 padded with 0s for task 1 data records.
Follow run_detext_multitask.py for other input parameters

Type of change

[x] New feature (non-breaking change which adds functionality)

List all changes

Please list all changes in the commit.

Added task_id in data_fn.py. task_id is a list of type int64 and only contains 1 entry (same as query or uid).
Added task-specific MLP and LTR layers in deep_match.py
Modified misc_utils.py to specify and organize required parameters/features
Updated serving_input_fn, multitask weights in train.py
Added sample data for testing:
- multitask_examples.tfrecord / test.tfrecord: sample multitask training data
- run_detext_multitask.sh: sample script to run multitask training
Added and updated unit tests

Testing

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

pytest passed
bash run_detext.sh showed consistent results
bash run_detext_multitask.sh partial outputs shown below:

Trainable variables

INFO:tensorflow:# Trainable variables
INFO:tensorflow:  w_embedding:0, (100, 32), /device:CPU:0
INFO:tensorflow:  cnn/query_cnn_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/query_cnn_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_0_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_0_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_1_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_1_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  wide_ftr_norm_w:0, (10,), /device:CPU:0
INFO:tensorflow:  wide_ftr_norm_b:0, (10,), /device:CPU:0
INFO:tensorflow:  task_0_hidden_projection_0/kernel:0, (12, 100), /device:CPU:0
INFO:tensorflow:  task_0_hidden_projection_0/bias:0, (100,), /device:CPU:0
INFO:tensorflow:  task_0_scoring/kernel:0, (100, 1), /device:CPU:0
INFO:tensorflow:  task_0_scoring/bias:0, (1,), /device:CPU:0
INFO:tensorflow:  task_1_hidden_projection_0/kernel:0, (12, 100), /device:CPU:0
INFO:tensorflow:  task_1_hidden_projection_0/bias:0, (100,), /device:CPU:0
INFO:tensorflow:  task_1_scoring/kernel:0, (100, 1), /device:CPU:0
INFO:tensorflow:  task_1_scoring/bias:0, (1,), /device:CPU:0
INFO:tensorflow:total bert parameters:
INFO:tensorflow:0

Eval results of best model on test data

INFO:tensorflow:global_step = 50
INFO:tensorflow:loss = 0.992338
INFO:tensorflow:metric/ndcg@10 = 0.99576235

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

xwli-chelsea commented 4 years ago

Wide_ftrs also need to match among tasks right?

zhoutong-fu commented 4 years ago

Comments left. Thanks for the implementation for multitask training!

Let's add more details on what this PR is not implementing here. I.e., although different losses are computed, these losses are using the same loss_fn. Also, we do not support mixing ranking tasks with classification tasks.

Updated in the PR description. Let me know if it looks good to youl

zhoutong-fu commented 4 years ago

Wide_ftrs also need to match among tasks right?

Updated the PR and added a section on how to run multitask training.

linkedin / detext

Added the option to run multitask training for ranking tasks. #13

Description

How to run multitask training?

Type of change

List all changes

Testing

Checklist