mims-harvard / UniTS

A unified multi-task time series model.
https://zitniklab.hms.harvard.edu/projects/UniTS/
MIT License
363 stars 45 forks source link

IncompatibleKeys Error when load pre-trained model to do the fine-tuning. #14

Open websitefingerprinting opened 2 months ago

websitefingerprinting commented 2 months ago

Nice work and I have a question here:

I am trying to pre-train and finetune a model on my own datasets. However, some warnings were raised when loading the pre-trained model during finetuning:

loading pretrained model: checkpoints/ALL_task_UniTS_pretrain_x64_bs1024_UniTS_All_dm64_el3_Exp_0/pretrain_checkpoint.pth
_IncompatibleKeys(missing_keys=['category_tokens.CLS_dataset1', 'category_tokens.CLS_dataset2'], unexpected_keys=['pretrain_head.proj_in.weight', 'pretrain_head.proj_in.bias', 'pretrain_head.mlp.fc1.weight', 'pretrain_head.mlp.fc1.bias', 'pretrain_head.mlp.fc2.weight', 'pretrain_head.mlp.fc2.bias', 'pretrain_head.proj_out.weight', 'pretrain_head.proj_out.bias', 'pretrain_head.pos_proj.weights', 'pretrain_head.pos_proj.bias'])

Is everything correct here?

Thank you for your help!

gasvn commented 2 months ago

It's fine, there is a extra head during pretraining, which is not used for finetuning.

websitefingerprinting commented 2 months ago

Thank you for your prompt response. One more question if you have any idea about it:

Few-shot finetuning didn't yield good results for my classification task. I pre-trained and finetuned using my own datasets. With only 5% of the data, accuracy was very low, but it improved with 100% data. However, pretraining didn't offer any advantage compared to supervised training.

I append my pretrain train loss below. pretrain

Does the pretrain loss look correct?

Many thanks! (It is fine if you have no idea, since I may make a mistake in this process or my problem may not fit in.)

gasvn commented 2 months ago

I am not so sure. The loss looks pretty large. Have you tried only do the prompt tuning with the pretrained model? If it don't get a reasonable performance, it means the pretraining is not working well.

websitefingerprinting commented 2 months ago

Thank you again for your suggestion.

My data is very different from the ones in your paper. My input is a sequence of unnormalized integers (i.e., $x_i \in N^{Length \times Dim}$). I guess that may be the reason why the reconstructed loss is large. So, I should normalize the data before input to the transformer?