linkedin / detext

DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
BSD 2-Clause "Simplified" License
1.26k stars 133 forks source link

Added the option to run multitask training for ranking tasks. #13

Closed zhoutong-fu closed 4 years ago

zhoutong-fu commented 4 years ago

Description

Detext clients have asked for multitask learning on ranking tasks so that the model can benefit from different subtasks. After reviewing surveys of multitask learning literature, we've decided to add the basic yet very popular hard-parameter sharing neural network structures to Detext framework.

This implementation assumes that all subtasks share the same deep features while wide features and labels are different, which suggests that both MLP and LTR layers are task-specific and other layers are shared across subtasks. The implementation also assumes individual task losses to be optimized separately and each training record only contains features/labels for one subtask. For training records with more than one sub-tasks, please split them into multiple records.

In summary, this implementation supports multitask training when

This implementation does NOT support

How to run multitask training?

Type of change

List all changes

Please list all changes in the commit.

Testing

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Trainable variables

INFO:tensorflow:# Trainable variables
INFO:tensorflow:  w_embedding:0, (100, 32), /device:CPU:0
INFO:tensorflow:  cnn/query_cnn_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/query_cnn_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_0_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_0_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_1_3/kernel:0, (3, 32, 1, 50), /device:CPU:0
INFO:tensorflow:  cnn/doc_cnn_1_3/bias:0, (50,), /device:CPU:0
INFO:tensorflow:  wide_ftr_norm_w:0, (10,), /device:CPU:0
INFO:tensorflow:  wide_ftr_norm_b:0, (10,), /device:CPU:0
INFO:tensorflow:  task_0_hidden_projection_0/kernel:0, (12, 100), /device:CPU:0
INFO:tensorflow:  task_0_hidden_projection_0/bias:0, (100,), /device:CPU:0
INFO:tensorflow:  task_0_scoring/kernel:0, (100, 1), /device:CPU:0
INFO:tensorflow:  task_0_scoring/bias:0, (1,), /device:CPU:0
INFO:tensorflow:  task_1_hidden_projection_0/kernel:0, (12, 100), /device:CPU:0
INFO:tensorflow:  task_1_hidden_projection_0/bias:0, (100,), /device:CPU:0
INFO:tensorflow:  task_1_scoring/kernel:0, (100, 1), /device:CPU:0
INFO:tensorflow:  task_1_scoring/bias:0, (1,), /device:CPU:0
INFO:tensorflow:total bert parameters:
INFO:tensorflow:0

Eval results of best model on test data

INFO:tensorflow:global_step = 50
INFO:tensorflow:loss = 0.992338
INFO:tensorflow:metric/ndcg@10 = 0.99576235

Test Configuration:

Checklist

xwli-chelsea commented 4 years ago

Wide_ftrs also need to match among tasks right?

zhoutong-fu commented 4 years ago

Comments left. Thanks for the implementation for multitask training!

Let's add more details on what this PR is not implementing here. I.e., although different losses are computed, these losses are using the same loss_fn. Also, we do not support mixing ranking tasks with classification tasks.

Updated in the PR description. Let me know if it looks good to youl

zhoutong-fu commented 4 years ago

Wide_ftrs also need to match among tasks right?

Updated the PR and added a section on how to run multitask training.