Closed hschaeufler closed 1 month ago
Addition: I have now set the temp to 0 to get repeatable results and activated verbose. It seems that in a few cases the finetuned model repeats specific setences forever. Does anyone have any idea why this might be or how I can prevent it?
Is this due to the Lora fine-tuning? Should I perhaps also use the chat template for my training data set? Or is it perhaps because I am lora fine-tuning all layers (‘self_attn.q_proj’, ‘self_attn.v_proj’, ‘self_attn.k_proj’, ‘self_attn.o_proj’, ‘mlp.gate_proj’, ‘mlp.down_proj’, ‘mlp.up_proj’)?
expect(
find.byWidgetPredicate(
(widget) =>
widget is OutlinedButton &&
widget.onPressed == null &&
(widget.style!.side! as BorderSide).color ==
Theme.of(find.byType(XYZButton)).disabledColor,
),
findsOneWidget,
);
expect(
find.byWidgetPredicate(
(widget) =>
widget is OutlinedButton &&
widget.onPressed == null &&
(widget.style!.side! as BorderSide).color ==
Theme.of(find.byType(XYZButton)).disabledColor,
),
findsOneWidget,
);
expect(
find.byWidgetPredicate(
(widget) =>
widget is OutlinedButton &&
widget.onPressed == null &&
(widget.style!.side! as BorderSide).color ==
Theme.of(find.byType(XYZButton)).disabledColor,
),
findsOneWidget,
);
expect(
find.byWidgetPredicate(
(widget) =>
widget is OutlinedButton &&
widget.onPressed == null &&
(widget.style!.side! as BorderSide).color ==
Theme.of(find.byType(XYZButton)).disabledColor,
),
findsOneWidget,
);
I have now set ‘repetition_penalty’: 1.1 and max_tokens=35000. In initial tests, this has resulted in no more eternal repetitions. I'll run it through tonight on the full dataset.
Has run through with the settings mentioned. I am closing the ticket. If anyone still has any tips on how I can prevent this during lora fine tuning, I am happy to receive suggestions.
Describe the bug I use MLX_LM to generate tests for different classes (entries in a data frame) using a model fine-tuned with MLX_LM. Depending on the model, the generate_test step hangs after a certain number or for certain entries, so that the generate method does not provide an answer even after several hours. It looks as if MLX is endlessly generating a text and would not reach an end token.
Is there any idea how I can avoid the problem, or is it possible to define a timeout? Maybe this is also due to an internal cache that runs full?
To Reproduce
Include code snippet
I have to censor the output because some of it contains sensitive data. But you can see that nothing happens for almost 2 hours before I cancelled the processing.
If I only generate code for the class I get a result:
Expected behavior I would expect a response after a few minutes, or if it doesn't finish after a while, I get a timeout error after a few minutes.
Desktop (please complete the following information): OS Version: MacOS 14.16.1 Version 0.19.0
Additional context The fused model has been fintuned with one of the previous MLX versions and fused with the current version of MLX. I don't use a prompt template because I just want to get the code back like in the lora-Finetunung-Set and without explanations.