Using distributed or parallel set-up in script?: No
Who can help?
@gante @zucchini-nlp
Information
[X] The official example scripts
[ ] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction
See #30826
test_assisted_decoding_matches_greedy_search_0_random is forcibly skipped for Jamba because it's necessary to unset _supports_cache_class to resolve failing tests on main.
test_assisted_decoding_matches_greedy_search_0_random appears to pass for Mamba, but this is because all_generative_models is not set in the model tester
Expected behavior
Either test_assisted_decoding_matches_greedy_search_0_random can be run for both models with _supports_cache_class unset or it's not necessary to have _supports_cache_class unset for Jamba
System Info
transformers
version: 4.41.0.dev0Who can help?
@gante @zucchini-nlp
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
See #30826
test_assisted_decoding_matches_greedy_search_0_random
is forcibly skipped for Jamba because it's necessary to unset_supports_cache_class
to resolve failing tests on main.test_assisted_decoding_matches_greedy_search_0_random
appears to pass for Mamba, but this is becauseall_generative_models
is not set in the model testerExpected behavior
Either
test_assisted_decoding_matches_greedy_search_0_random
can be run for both models with_supports_cache_class
unset or it's not necessary to have_supports_cache_class
unset for Jamba