mega002 / lm-debugger

The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.
Apache License 2.0
171 stars 17 forks source link

Would this work on NLU tasks? #5

Closed aneof closed 2 years ago

aneof commented 2 years ago

Hi there, this is not as much of a Github issue as a question about whether this approach would work on any downstream NLU task, for example named entity recognition where each token's contextual vector matters a lot for its particular predicted label. If I understood correctly, the Intervention part is dependent solely on the input data and the FFN layers. Additionally, would it be plausible to adjust the method to focus on specific keywords when conducting sentiment analysis, even on prompt-based modeling?

mega002 commented 2 years ago

Hi @aneof ,

Thank you for the great questions.

Regarding applying this method on models trained for NLU tasks -- technically, this is possible. While the interface of LM-Debugger was designed to support next-token prediction, you could analyze and intervene in the FFN operation at any position. Also, please note that changing the weights of FFN sub-updates is actually independent on the input (i.e. you could find value vectors that encode specific concepts and just set their weights, and this process can be done without a particular input in mind). That being said, I am not sure what will be the empirical effect of such interventions in the context of NLU tasks, and if they will be meaningful when considering classification tasks such as NER.

Regarding your second question -- I'm not sure I understand what you mean by focusing on specific keywords, could you please explain? If you would like to apply interventions only for specific input tokens then I guess this is possible, the only question is how to decide when to intervene or not. In general, our method is "soft" in the sense that it doesn't necessarily always change the output, it works internally and the model still produces coherent outputs when the configured interventions are not too brutal/artificial. If your question is about pushing the model to output specific keywords then I guess that depends on whether it's possible to identify value vectors that promote these specific words and configuring interventions accordingly.

Please feel free to close this issue if this answers your questions.

Thanks, Mor

aneof commented 2 years ago

Hi Mor,

Thank you for the detailed analysis. As for some clarification about the second question, I was practically wondering whether we could edit vectors that correspond to specific keywords that are followed by hard-to-classify entities. An example that comes to mind could be "watching (movie title) online" vs. "watching (movie title) on Netflix". I was wondering whether in the case that a model struggles to classify the movie as a Streaming Service Movie vs. Classic Cinema Movie (or other such fine-grained classes) it could be possible to edit some vectors in order to further clarify the classification boundaries.

Closing this issue as the explanation was completely adequate.