Few shot setting in pre-train

microsoft / Industrial-Foundation-Models

Dedicated to building industrial foundation models for universal data intelligence across industries.

MIT License

34 stars 1 forks source link

In our approach, we do not have separate sets of model parameters for each few-shot scenario. Instead, we use a single set of model parameters that is pretrained on a mixture of data across six different few-shot settings (0, 4, 8, 16, 32, and 64 shots).

Here's how we construct the data for pretraining:

For each dataset, we select a number of test samples.
Context samples are selected randomly.

The data construction is guided by several strategies to optimize learning:

Avoiding Duplication: For different few-shot scenarios, we expect that different test samples are used. This prevents the model from merely memorizing specific samples.
Fixed Group of Context Samples: Within a single few-shot scenario, we use a fixed group of context samples for all test samples. This strategy helps the model focus on identifying differences between the test samples given the same context.

microsoft / Industrial-Foundation-Models

Few shot setting in pre-train #9