Closed jeffkinnison closed 11 months ago
6 files ±0 6 suites ±0 13m 56s :stopwatch: +2s 12 tests ±0 9 :heavy_check_mark: ±0 3 :zzz: ±0 0 :x: ±0 60 runs ±0 42 :heavy_check_mark: ±0 18 :zzz: ±0 0 :x: ±0
Results for commit 0c8d45cd. ± Comparison against base commit fe3f0390.
:recycle: This comment has been updated with latest results.
Overview
This adds a text encoder for ECD that wraps a pretrained LLM and passes the model's hidden state downstream to the combiner. We reuse large portions of the LLM model type's utitilies, refactored as utility functions rather than
LLM
methods.LLM Encoder
The encoder subclasses
SequenceEncoder
and implements custom behavior for working with LLMs, with pieces borrowed fromLLM
. Whereas theLLM
model type is focused on text generation, the encoder is focused on 1) passing hidden state downstream for predictive tasks and 2) packaging the adapter with the rest of the ECD architecture. Major methods include__init__
: Loads the pretrained model and adapter from configforward
: Takes tokenized text as input and returns the hidden state of the last layer_save_to_state_dict
: Normally, adapter weights are saved withPEFTModel.save_pretrained
, which writes the weights to file. In ECD, however, additional parameters need to be recorded, particularly the combiner and decoder weights. Since these weights will have been trained alongside a particular adapter, we want to package them together. This method adds custom logic to extract adapter weights as part of the dictionary returned bymodel.state_dict()
.load_from_state_dict
: Under the hood, PEFT alters the names of the adapter parameters. This leads to errors loading the state dict back in. This adds additional logic to load the state dict with a PEFT utility then update theload_state_dict
args to reflect that the adapter state was loaded in.Refactored
LLM
MethodsThe following methods have been moved from the
LLM
class intoludwig.utils.llm_utils
:initialize_adapter
: This now takes aPretrainedModel
as an argumentand returns aPEFTModel
load_pretrained_from_config
: This was moved fromludwig.models.llm
with no changesto_device
: This now takes a model and device as arguments and returns the model on device and destination device