Closed rjtshrm closed 1 month ago
@rjtshrm this looks fine, has your methods been successful? happy to chat more on this.
@rjtshrm were you able succussfully complete your approach? how was the result?
@HamidShojanazeri, I'm also going try the same approach. Did the Llama team open source the instruction dataset? so, that I can download and use that dataset to fine tune the model.
@HamidShojanazeri I have one question, is it mandatory build the instruction dataset(like question and answer prompt) should be from the training dataset(my domain dataset)? or can I randomly use any instruction dataset to fine tune the model, just to make the model to adapt to this instruction prompt?
can you please provide your thoughts?
@IamExperimenting Llama team didn't open source the instruction dataset, I am working on e2e recipe for chatbot. But overall you can use this example of custom dataset which uses open assistant dataset and you can see if you are using llama-chat as your base model you would need this special tokens being added as shown in the script.
, is it mandatory build the instruction dataset(like question and answer prompt) should be from the training dataset(my domain dataset)? or can I randomly use any instruction dataset to fine tune the model, just to make the model to adapt to this instruction prompt?
If you want do a chat bot on your specific domain for sure you need that data either inform of Q&A or instruction and out sets.
So far with the example of fine tuning I see examples of summarisation, chatbot based on specific use cases etc. However, I want to build the a chatbot based on my own private data (100s of PDF & word files). How can I fine tune on this. The approach I am thinking is 1-> LoRA fine tuning of the base alpaca model on my own private data 2-> LoRA fine tuning of the above model on some input output prompts.
Is it a good technique for build chatbot on private datasets. Please someone can suggest a good way of building model based on private data.