stanfordnlp / pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
http://pyvene.ai
Apache License 2.0
559 stars 50 forks source link

[P1] Dynamic Intervention Scheduler #111

Open frankaging opened 5 months ago

frankaging commented 5 months ago

Descriptions: Currently, we only support basic interventions during model generation just like the model forward call. This is not ideal. In model generation, we want to support more free-formed interventions (e.g., intervene based on decoding steps or other decoding parameters, not just unit location as if it is in an intervened forward mode).

The current infra (also this applies to other existing intervention library as well) cannot support this. For instance, it does not support a specific decoding step intervention during decoding without more incisive code change. To support complex cases, we plan to introduce a new notion of Intervention Scheduler.

In the high-level, the scheduler is responsible to schedule interventions dynamically at inference time, and it is customizable. For instance, we can (1) intervene on all decoded punctuation tokens, or (2) all verbs that get decoded, or (3) all the last entity token that gets decoded in a specific entity set.

This enables us to a wide spectrum of ways to steer model behavior with interventions. This ticket may require multiple changes.