karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MIT License
9.16k stars 854 forks source link

Minbpe as a potential course #19

Closed ViswanathaReddyGajjala closed 8 months ago

ViswanathaReddyGajjala commented 8 months ago

I thoroughly enjoy Karpathy's YouTube content; it's consistently top-notch. I've been wondering whether this could potentially evolve into a concise course on platforms like Coursera or edX. The content he provides, usually around two hours long, is substantial, and I've noticed the inclusion of coding exercises for Minibpe. If complemented with quizzes, it could transform into a robust course. At the same time, I understand that organizing and setting up a course requires additional effort.

We could explore a dedicated platform, akin to deeplearning.ai, which could be worth considering. I felt compelled to share this thought.

Moreover, a considerable number of people might be willing to contribute their time and efforts to help build such a platform(open-source), considering the extensive reach and impact of his video content.

I'm curious to hear the thoughts of others on this matter.

ViswanathaReddyGajjala commented 8 months ago

I intend to close this issue within the next 24 hours.

karpathy commented 8 months ago

I don't know I don't really like these platforms too much and they usually irritate me with dark patterns when I stop by. I don't want to sign up for anything, I just want the content. I don't want to "enroll" in anything, I just want the content. I don't want to be pestered with engagement or marketing email afterwards. Zero To Hero is already several "lectures" in, and most have exercises in the comments, there is a Discord community all learning together which is a bit like the discussion boards. Forget existing platforms what is the ideal experience you'd want to see and that you're missing?

ViswanathaReddyGajjala commented 8 months ago

I just finished watching the video, and it was amazing. I can't thank you enough for creating the content on this topic. It's 4 am, and I feel like I know everything about tokenization now(lol). I plan to read/re-read all the papers/blogs you mentioned.

Slides and assessments (especially for students). This is just my thinking. So, please feel free to ignore this.

  1. CS231n had fantastic slides, and I used to refer to them before interviews, along with the notes I took. I miss having those slides for a quick recap later on; it requires a lot more effort. Maybe as a community(discord), we can take the initiative to address this.

  2. Coding assessments: I plan to work on making the exercise.md a bit more engaging by setting up testing/grading(to verify their implementation). Starter code is useful to guide us back to the readings and refine our thought processes.

There were so many resources, and it's impressive. Back in the 2020s, when I first read about GPT, I couldn't find proper information to learn about byte pair encoding. Your video has been immensely helpful in filling that knowledge gap for me(nearly after 4+ years). Thank you!

karpathy commented 8 months ago

Sure, if you can help with any of the above I'm happy to link to it. Ty!