Closed yum-yeom closed 4 months ago
Hi @yum-yeom -
so there are two ways of doing it:
epochs=0
like you did, and specify cfg.architecture.pretrained_weights="path_to_checkpoint.pth"
I believe you want rather option 2) - could you try it that way please and report back if it works?
Hi! First of all, thank you for your answer.
I'm currently unable to upload and use the model in HF, so I tried method 2 and still get the error. Probably, it doesn't receive the cfg values I set.
So, I customized the part of importing Model and Tokenizer in LLM Studio source, saved the model as .bin in a separate path, imported it, and using it with train.py & epoch=0.
Using the saved model as above, the imported model was no problem and the performance seems to be reproducible.
please re-open in case there are still open issues
Issue
I am trying to inference a model trained with the CLI in LLM Studio on a distributed GPU environment. I am trying to distribute the CLI trained model for inference, but I am facing some issues.
In order to infer the trained model on a valid set, I added epoch=0 and evaluate_before_training=True options to the distributed_train.sh script and tried to run it. However, I received an error message that the config.json file does not exist.
Queries
To Reproduce
epoch=0
&evaluate_before_training=True
setting.OSError: {{finetuned_model_path}} does not appear to have a file named config.json. Checkout 'https://huggingface.co//{{finetuned_model_path}}//main' for available files.
Let me know if there is anything else I should share. If you can help, it would be greatly appreciated.
LLM Studio version
v1.3.1