karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
37.61k stars 6k forks source link

Making nano chatgpt #92

Open nebyu08 opened 1 year ago

Spiritdude commented 1 year ago

Even though it expands beyond the scope of nanoGPT, a nanoChatGPT also came to my mind: replace web-site/blog search engine with nanoChatGPT answering questions based on information of a limited yet strong focused textset or factset.

I suggest we keep this thread/issue open so people can comment with links of "nano" ChatGPT like projects.

zhzLuke96 commented 1 year ago

chatgpt training process is publicly available, but the results are highly dependent on the fine-tuned datasets that openai, which is hard to do

I think we can try webgpt, which is a more engineered solution

https://arxiv.org/abs/2112.09332 https://openai.com/blog/webgpt/


of course these are far beyond the scope of nanoGPT/MakeMore courses but, exercises after class are really fun

Spiritdude commented 1 year ago

Another effort at https://open-assistent.io, code at https://github.com/LAION-AI/Open-Assistant

zhzLuke96 commented 1 year ago

I saw a project about chatgpt and felt compelled to share https://github.com/hpcaitech/ColossalAI

They have made a very magical optimization of the training process. It is said that chatgpt(small) can be fine-tuned on a single card, and there is a complete training process code

sanjeevanahilan commented 1 year ago

I've started building a nanoChatGPT as a fork of Karpathy's chatGPT. I also introduce a new idea for training by backpropagating through the reward function using the Gumbel-Softmax trick rather than policy gradient (PPO).

It works for a basic example but is still very crude though and far from being useful at scale. Sharing here in case anyone wants to try:

https://github.com/sanjeevanahilan/nanoChatGPT

Spiritdude commented 1 year ago

https://github.com/togethercomputer/OpenChatKit

PiotrNawrot commented 1 year ago

We have released nanoT5 for pre-training and evaluating T5-style (Encoder-Decoder) models. You can use it to pre-train your own model in one day on a single GPU :).

VatsaDev commented 1 year ago

Huh, There's actually an issue with my project ideas name on it. I have a NanoChatGPT here It has Chat functionality, human, bot, and endOfText tokens, along with a conversational dataset and more. I'm working on crude RLHF-like functionality, and would love contributors