microsoft / deep-language-networks

We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network - DLN
MIT License
87 stars 12 forks source link

Make sure we deal all options for each output class. #24

Closed MarcCote closed 8 months ago

MarcCote commented 8 months ago

For instance in data_understanding we have "a|A", "b|B", etc. We need to shift logits for all the possible options.

MarcCote commented 8 months ago

@sordonia, thoughts?

MarcCote commented 8 months ago

Let's merge it. Then