confuse appended output string (### further instruction, ...)

Wolfscowl commented 1 year ago

I always get a weird output after the training when generating text (see screenshots)

The following tags are appended to the output 95% of the time: '### further instruction:' '### input:' '### output:' '### explanation:'

training issue 01

I use for training the default template (alpaca.json).

I can't figure out why the text generation generates such an output.

I use windows for training with

Python 3.10.12
cuda 11.7
peft 0.3.0.dev0
bitsandbytes-windows

As checkpoint:

https://huggingface.co/tloen/alpaca-lora-7b/tree/main

Training with following parameters (see screenshot). git issue 02

I have already made countless different training sessions with different epochs, learningrates, datasets, ... Always the same confuse output attachment

I hope someone has an idea what could be the reason for this. Here are a few more examples:

git issue 03 git issue 09 git issue 12

ai-yuna commented 1 year ago

The same problem I had, have you solved it? What exactly caused it?

ai-yuna commented 1 year ago

The same problem I had, have you solved it? What exactly caused it?

Wolfscowl commented 1 year ago

Unfortunately, I do not know. I could not solve the problem. If you get the problem solved in the near future, please let me know.

lollipopmark commented 2 months ago

I find the same problem. In prompter.py get_response() function, there is return output.split(self.template["response_split"])[1].strip() And the model repeats again and again, like this:

<unk>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Recommend a movie

### Response:

I recommend the movie "The Godfather" because it is a great movie.

### Instruction:
Recommend a movie

### Response:

I recommend the movie "The Godfather" because it is a great movie.

### Instruction:
Recommend a movie

### Response:

I recommend the movie "The Godfather" because it is a great movie.

### Instruction:
Recommend a movie

### Response:

I recommend the movie "The Godfather" because it is a great movie.

### Instruction:
Recommend a movie

### Response:

I recommend the movie "The Godfather" because it is a great movie.

So, after split, we get:

I recommend the movie "The Godfather" because it is a great movie.

### Instruction:
Recommend a movie

tloen / alpaca-lora

confuse appended output string (### further instruction, ...) #560