microsoft / Industrial-Foundation-Models

Dedicated to building industrial foundation models for universal data intelligence across industries.
MIT License
34 stars 1 forks source link

Few shot setting in pre-train #9

Open wyclike opened 2 weeks ago

wyclike commented 2 weeks ago

I would like to inquire about how the few-shot approach is specifically incorporated into your pretraining process. For instance, the paper mentions six different few-shot scenarios with 0, 4, 8, 16, 32, and 64 shots. Does each entry in your pretraining dataset include few-shot content? If so, would that imply there are six sets of model parameters? And if they are included, how is the selection of few-shot instances determined — is it fixed or random? If not included, would it be correct to understand that you train a single set of zero-shot pretrain weights and then consider the various few-shot conditions during testing?

xumwen commented 4 days ago

In our approach, we do not have separate sets of model parameters for each few-shot scenario. Instead, we use a single set of model parameters that is pretrained on a mixture of data across six different few-shot settings (0, 4, 8, 16, 32, and 64 shots).

Here's how we construct the data for pretraining:

  1. For each dataset, we select a number of test samples.
  2. Context samples are selected randomly.

The data construction is guided by several strategies to optimize learning: