johnsmith0031 / alpaca_lora_4bit

MIT License
534 stars 84 forks source link

Other datasets #106

Closed Ph0rk0z closed 1 year ago

Ph0rk0z commented 1 year ago

Any easy way to support other datasets?

I have a big one that is just Prompt and then response without the "input".

I thought to just add new dataset and edit:

    # Auxiliary methods
    def generate_prompt(self, data_point, **kwargs):
        return "{0}\n\n{1}\n{2}\n\n{3}\n{4}\n\n{5}\n{6}".format(
            "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.",
            "### Instruction:",
            data_point["instruction"],
            "### Input:",
            data_point["input"],
            "### Response:",
            data_point["output"]

But how to tell if dateset is being fed to the model correctly?

Hoping using this directly is faster than training through textgen.. I also find that xformers slows things down :(

johnsmith0031 commented 1 year ago

If you want to use customized dataset I think currently the best way is just to copy finetune.py to some other file such as finetune1.py and edit it, then check if the data format is correct before training.

Ph0rk0z commented 1 year ago

I think I actually figured it out and got it working