extend nanoGPT trained model to answer questions

aartivnkt commented 1 year ago

Hi there, I have experimented with the code here and trained a model on a large, custom dataset (~30GB) (see prev issue closed here). What I have right now is more or less a document completer. I want to take this further, and do some fine tuning for Q/A. I prepared some input for this, in the fashion of question, followed by the context, and then the answer -- curating this from some truth datasets. However, after finetuning on this, the response looks like more of a document completion, with no ability to answer questions as such.

How can I use the model trained with nanoGPT for this type of task? Is it possible to use some code available in huggingface or elsewhere, to further fine tune my model for Q/A? Or do I need to write my own code to do this type of fine tuning ? Any input on how to take this further for tasks (document classification or Q/A) would be great. Thanks so much

Majdoddin commented 1 year ago

some quick thoughts: usually transformer architecture has an encoder and a decoder. encoder reads the input/query (like text in language A) and decoder generates the ouput (like translation in language B). nanoGPT has just the decoder. But maybe one still can get to do Q/A/

Maybe you could also try chatGPT "in the loop" sth like ReAct It indexes your dataset, and chatGPT can make queries to it and read it. So no need to train or fine-tune the model (assuming chatGPT knows the language of your dataset)

Much better, OpenAI has just announced plugins, which can bring the same function in a more robust way.

Coriana commented 1 year ago

I would try modifying the https://github.com/tloen/alpaca-lora clone of alpaca to NanoGPT and trying to get it to instructGPT levels like it seems to be at.

that being said, you might be able to convert the model and use their scripts as is, as I see that what they are doing.

Majdoddin commented 1 year ago

@Coriana Very nice point, but I think she can just fine-tune alpaca-lora on her dataset, thus does not need nanoGPT.

chrisociepa commented 1 year ago

check out ALLaMo ( https://github.com/chrisociepa/allamo ) if you want to train LLaMA-based models. You can import LLaMA model and then train/finetuning further. One thing that I found missing in nanoGPT is ability to provide attention mask

karpathy / nanoGPT

extend nanoGPT trained model to answer questions #221