sileod / tasknet

Easy multi-task learning with HuggingFace Datasets and Trainer
GNU General Public License v3.0
45 stars 4 forks source link

Why expect Z in Adapter? #8

Open niedakh opened 4 months ago

niedakh commented 4 months ago

The class Adapter expects Z in constructor:

class Adapter(transformers.PreTrainedModel):
    config_class = transformers.PretrainedConfig
    def __init__(self, config, classifiers=None, Z=None, labels_list=[]):
        super().__init__(config)    
        self.Z= torch.nn.Embedding(len(config.classifiers_size),config.hidden_size, max_norm=1.0).weight if Z==None else Z
        self.classifiers=torch.nn.ModuleList(
            [torch.nn.Linear(config.hidden_size,size) for size in config.classifiers_size]
        ) if classifiers==None else classifiers
        self.config=self.config.from_dict(
            {**self.config.to_dict(),
            'labels_list':labels_list}
        )
    def adapt_model_to_task(self, model, task_name):
        task_index=self.config.tasks.index(task_name)
        #setattr(model,search_module(model,'linear',mode='class')[-1], self.classifiers[task_index])
        model.classifier=self.classifiers[task_index]
        return model
    def _init_weights(*args):
        pass 

but doesn't use it at all when adapting model to task?

sileod commented 4 months ago

Hi, great question

It is used here: https://github.com/sileod/tasknet/blob/c9f43604c2c8bc7e33e3301f7c05b5dc8f77c0eb/src/tasknet/utils.py#L210 But actually, it would be cleaner to have it in adapt_model_to_task I'll try to do it for the next release

The general idea is to have a shared encoder, one classifier per task (unless some task share all their labels), and task embedding per task The task embedding is randomly dropped at 10% rate to work without using it, but it allows the model to "see" the task it should do and it improves results, so it is best to add it alongsides the classifier. It's actually the core of the Adapter