inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍
https://inseq.org
Apache License 2.0
378 stars 36 forks source link

Feature/reagent branch for ReAGent #250

Closed casszhao closed 7 months ago

casszhao commented 9 months ago

ReAGent, for a Model-agnostic Feature Attribution Method for Generative Language Models

Paper link: https://arxiv.org/abs/2402.00794

Type of Change

gsarti commented 9 months ago

Hey there, thanks a lot for the PR @casszhao @xuan25 !

I had a look at the code structure you pushed and I have some comments on the current implementation:

I will keep having a look in the next week, if you could address some of the issues I describe up here it would be of great help!

gsarti commented 9 months ago

I've had a look at the changes and we're much closer to get it merged @xuan25, thanks! 🤗

Will have a second look to fix minor details. Just a quick Q:

xuan25 commented 9 months ago

I've had a look at the changes and we're much closer to get it merged @xuan25, thanks! 🤗

Will have a second look to fix minor details. Just a quick Q:

* I've seen that currently you raise an exception when an encoder-decoder with `attribute_target` is used. In principle that would be fine, but would there be a way to support ReAGent perturbations on target-side prefixes? Intuitively, this shouldn't be too problematic to adapt, right? I think the only potential pain point would be to sample tokens from the correct vocabulary, since Enc-Dec models may have two (see e.g. https://huggingface.co/Helsinki-NLP/opus-mt-en-zh/tree/main)

Thanks for the feedback. Yeah, I have implemented the attribute target in the latest commits, but without limiting the sampled token within the same language (vocabulary set). However, it should do the job of at least perturbating the inputs.

gsarti commented 7 months ago

Hi @xuan25 @casszhao, sorry for the delay! I'm having a look at this and ran with no issues using decoder-only and encoder-decoders (both with and without target attribution), so I think we are quite close to merging now. I pushed some fixes including:

xuan25 commented 7 months ago

Hi @xuan25 @casszhao, sorry for the delay! I'm having a look at this and ran with no issues using decoder-only and encoder-decoders (both with and without target attribution), so I think we are quite close to merging now. I pushed some fixes including:

  • setting NLTK as an optional dependency for inseq, matching the logic in the token sampler
  • slight variable renaming to make some logics more evident for the user (top_n and top_n_ratio are now called keep_top_n and keep_ratio to make it clear that tokens are kept)

Thanks, @gsarti !

gsarti commented 7 months ago

We are ready for merging! 🎉 Just noting down here some points in the current that can be improved regarding the current ReAGent implementation:

Provided the current implementation is functional for both decoder-only and encoder-decoder models, I will proceed with the merge and any further development regarding these issues should be performed in a dedicated PR. Thanks again @casszhao @xuan25 for your contribution! 😄

casszhao commented 7 months ago

Hi Thanks, will promote it later on X. Cheers ~

Best Regards

Cass Z linkedin.com/in/casszhao M: 44 7516 862694

On Sat, Apr 13, 2024 at 11:00 Gabriele Sarti @.***> wrote:

We are ready for merging! 🎉 Just noting down here three points in the current that can be improved regarding the current ReAGent implementation:

  • overlap_strict_pos currently defaults to True, and the False condition is in TODO. If it's added, the purpose of this check needs to be made more explicit in docstrings.
  • The AggregateRationalizer class currently supports only a batch size of 1 because it builds a batch of various masked examples using num_probes. Ideally, we'd want batching to still be allowed here, taking inspiration from the Captum Integrated Gradients implementation https://captum.ai/api/_modules/captum/attr/_core/integrated_gradients.html#IntegratedGradients where they face the same issue (theinternal_batch_size there is equivalent to num_probes, and it is used to build the interpolation steps across all batch elements.
  • Currently the ReAGent implementation doesn't make use of attributed_fn to specify what step function to use to estimate token importance, and always uses the logit. It would be good to use the attribution_model forward instead of extracting the underlying AutoModel, since it would automatically handle this and allow for out-of-the-box usage for e.g. contrastive feature attribution, or other user-specified step functions.

Provided the current implementation is functional for both decoder-only and encoder-decoder models, I will proceed with the merge and any further development regarding these issues should be performed in a dedicated PR. Thanks again @casszhao https://github.com/casszhao @xuan25 https://github.com/xuan25 for your contribution! 😄

— Reply to this email directly, view it on GitHub https://github.com/inseq-team/inseq/pull/250#issuecomment-2053596768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFSOHH6EJCBWLHSIOIARWDY5D64RAVCNFSM6AAAAABDPLAO26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJTGU4TMNZWHA . You are receiving this because you were mentioned.Message ID: @.***>