Overview

This adds a text encoder for ECD that wraps a pretrained LLM and passes the model's hidden state downstream to the combiner. We reuse large portions of the LLM model type's utitilies, refactored as utility functions rather than LLM methods.

LLM Encoder

The encoder subclasses SequenceEncoder and implements custom behavior for working with LLMs, with pieces borrowed from LLM. Whereas the LLM model type is focused on text generation, the encoder is focused on 1) passing hidden state downstream for predictive tasks and 2) packaging the adapter with the rest of the ECD architecture. Major methods include

__init__: Loads the pretrained model and adapter from config
forward: Takes tokenized text as input and returns the hidden state of the last layer
_save_to_state_dict: Normally, adapter weights are saved with PEFTModel.save_pretrained, which writes the weights to file. In ECD, however, additional parameters need to be recorded, particularly the combiner and decoder weights. Since these weights will have been trained alongside a particular adapter, we want to package them together. This method adds custom logic to extract adapter weights as part of the dictionary returned by model.state_dict().
load_from_state_dict: Under the hood, PEFT alters the names of the adapter parameters. This leads to errors loading the state dict back in. This adds additional logic to load the state dict with a PEFT utility then update the load_state_dict args to reflect that the adapter state was loaded in.

Refactored `LLM` Methods

The following methods have been moved from the LLM class into ludwig.utils.llm_utils:

initialize_adapter: This now takes a PretrainedModel as an argumentand returns a PEFTModel
load_pretrained_from_config: This was moved from ludwig.models.llm with no changes
to_device: This now takes a model and device as arguments and returns the model on device and destination device

ludwig-ai / ludwig

Add LLM Text Encoder #3828

Overview

LLM Encoder

Refactored `LLM` Methods

Unit Test Results

ludwig-ai / ludwig

Add LLM Text Encoder #3828

Overview

LLM Encoder

Refactored LLM Methods

Unit Test Results

Refactored `LLM` Methods