I was able to run the fine tuning script for the flan-t5-large model on a V100 and save the results without issues. Training was done with the example dummy conversations file with this command:
Then I loaded the new model with this command: python3 -m fastchat.serve.cli --model-path checkpoints_flant5_large_dummy/ and got this when I tried to interact:
Human: what is your name?
Assistant: Yes,mètres I’mbling amédias languageuniversal APIlungul languagechemical modelwählt....
I was able to verify reasonable responses when using the original google/flan-t5-large model so the environment is likely ok. I am probably running this incorrectly but cannot find any further documentation. There are some answered questions about Vicuna, but not much for Flan-T5. Does anyone know what might be wrong? Either way, it might good to have a little more documentation about the end-to-end fine tuning process for the Flan-F5 models.
I was able to run the fine tuning script for the flan-t5-large model on a V100 and save the results without issues. Training was done with the example dummy conversations file with this command:
Then I loaded the new model with this command:
python3 -m fastchat.serve.cli --model-path checkpoints_flant5_large_dummy/
and got this when I tried to interact:I was able to verify reasonable responses when using the original google/flan-t5-large model so the environment is likely ok. I am probably running this incorrectly but cannot find any further documentation. There are some answered questions about Vicuna, but not much for Flan-T5. Does anyone know what might be wrong? Either way, it might good to have a little more documentation about the end-to-end fine tuning process for the Flan-F5 models.