Closed rahul-tuli closed 7 months ago
I think this is going to break recipe export and probably a lot of other things, the session is meant to persist through the lifecycle of a model so that when we save the recipe all previously applied modifiers persist. It was never really designed to work for multiple models at a time, if this is something that is important to support, we need to be more careful about it. Does using the session context manager in the user script not work here?
My bad, you are correct an alternative would be to wrap each model creation call within this session context manager, but I don't think that's any better than calling active_session().reset()
, @mgoin do you think adding an optional flag like reset_session
in the from_pretrained(...)
method would be nicer? if not I'll close this PR and we'll think of a nicer way if this is an important use case.
Closing this out for now, awaiting response from product!
@rahul-tuli @Satrat Loading 2 quantized/modified models is a basic usage of our library imo and it will be easy for users to trip on this in long-lived notebook, distillation, or model comparison scenarios. At a high-level I think it is reasonable to assume that model = SparseAutoModelForCausalLM.from_pretrained(model_path)
would have no global side-effects, which doesn't seem to currently be the case based on this issue. Maybe attaching the session to the model as a member variable rather than relying on a global session would help get past this. I don't know the whole architecture of why session is used like this currently, but we should continue to discuss if it can't bend easily to support this use case. FYI @robertgshaw2-neuralmagic @bfineran
I agree with @mgoin
Description
A bug was discovered in the main branch where loading two or more quantized models sequentially with
SparseAutoModelForCausalLM.from_pretrained(...)
led to errors. This occurred because theactive_session
was not reset between the loads, causing conflicts during subsequent recipe applications.Proposed Fix
Implemented a fix by wrapping recipe applications within
session_context_manager()
, ensuringactive_session
is reset before each model's recipe is applied. This change isolates each model loading process, preventing the previously observed errors.Testing
Verified the fix by successfully loading multiple quantized models sequentially, which previously resulted in errors. The use of
session_context_manager()
before each recipe application has resolved the issue.Test script:
Before the fix:
After the fix: