jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
193 stars 67 forks source link

fix: allow settings trust_remote_code for new huggingface version #187

Closed chanind closed 2 weeks ago

chanind commented 2 weeks ago

Description

The newest version of Huggingface transformers won't allow loading old datasets that require allowing code execution without explicitly setting trust_remote_code=True. As a result, tests are now failing on CI since it loads the most recent version of transformers. This PR adds an option dataset_trust_remote_code which the user can set to allow this remote code execution so old datasets can be loaded in new versions of transformers, and also fixes CI.

Type of change

Please delete options that are not relevant.

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

jbloomAus commented 2 weeks ago

@chanind Why make it optional None? We should just default to False?

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 59.68%. Comparing base (227d208) to head (e873ee9). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #187 +/- ## ========================================== + Coverage 59.59% 59.68% +0.09% ========================================== Files 25 25 Lines 2636 2642 +6 Branches 445 446 +1 ========================================== + Hits 1571 1577 +6 Misses 987 987 Partials 78 78 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

chanind commented 1 week ago

I was worried that defaulting to False would be a breaking change, since then previously working code would no longer work if it's using an old dataset that requires setting the trust_remote_code variable. Setting it to None lets this be breaking or not based on Huggingface and not us.

jbloomAus commented 1 week ago

Ahhh I imagine setting to true avoids breaking change since if you trust remote code it will run?

On Mon, Jun 17, 2024, 10:58 PM David Chanin @.***> wrote:

I was worried that defaulting to False would be a breaking change, since then previously working code would no longer work if it's using an old dataset that requires setting the trust_remote_code variable. Setting it to None lets this be breaking or not based on Huggingface and not us.

— Reply to this email directly, view it on GitHub https://github.com/jbloomAus/SAELens/pull/187#issuecomment-2174502655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPMYZ7256KSKJW7L26NMUTZH5LYLAVCNFSM6AAAAABJNI2STWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUGUYDENRVGU . You are receiving this because you modified the open/close state.Message ID: @.***>