some curriculum suggestions

zmaas commented 2 weeks ago

Hey John! Here's the curriculum that I've worked on in the past. It's a bit less focused on language models as a sole topic, and more on modern ML from a broad perspective.

Essential Concepts of Neural Networks
- Perceptrons, Activation Functions, Loss Functions
- Neural Networks from Scratch (Implementation of backpropagation)
- Ref: "Gradient-Based Learning Applied to Document Recognition" by Lecun et al., 1998
Convolutional Neural Networks (CNNs) and Object Classification
- Theory of Operation (convolutions, pooling, deep CNNs like LeNet, AlexNet, ResNet)
- Classic MNIST classification task
- Ref: "Very Deep Convolutional Networks for Large-Scale Image Recognition" by Simonyan and Zisserman, 2014
Sequence Modeling with Recurrent Neural Networks (RNNs) and LSTMs
- Understanding sequence-dependence and memory in neural computation
- Ref: "Learning to Forget: Continual Prediction with LSTM" by Gers et al., 2000
Advanced Convolutional Networks: U-Net and WaveNet
- U-Net for image segmentation, WaveNet for audio generation
- Ref: "U-Net: Convolutional Networks for Biomedical Image Segmentation" by Ronneberger et al., 2015
- Ref: "WaveNet: A Generative Model for Raw Audio" by van den Oord et al., 2016
Introduction to Autoencoders
- Autoencoders as denoising and compression.
- Implementing a one-layer autoencoder for character-level embeddings
- Ref: "Image data compression using a neural network model" by Sonehara et al. 1989
Foundations of Attention and Transformer Models
- Attention mechanisms vs recurrence
- Ref: "Attention Is All You Need" by Vaswani et al., 2017
Variational Autoencoders (VAEs) and the Probabilistic Perspective
- Linking neural networks and probability theory
- Ref: "Auto-Encoding Variational Bayes" by Kingma and Welling, 2013
Modern Word Embeddings and Language Models
- From Word2Vec to BERT
- Ref: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al., 2018
Large-Scale Models: CLIP, GPT, BERT
- Training procedures, scalability, and efficiency
- Ref: "Learning Transferable Visual Models From Natural Language Supervision" by Radford et al., 2021
An overview of Diffusion Models
- Ref: "Denoising Diffusion Probabilistic Models" by Ho et al., 2020
Graph Neural Networks (GNNs) basics
- Ref: "Semi-Supervised Classification with Graph Convolutional Networks" by Kipf and Welling, 2016
Ethical Considerations in AI
- Bias, fairness, privacy, and social impacts
Special Topics for Deepening Knowledge
- Interpretability of models and techniques
- Ref: "Curve Detectors", Cammamarta et al., 2020
- Ref: "Toy Models of Superposition", Elhage et al., 2022

zmaas commented 2 weeks ago

Also likely useful for more advanced topics is this curriculum from Waterloo (grad level CS seminar) on recent language model advances: https://cs.uwaterloo.ca/~wenhuche/teaching/cs886/

sterrettJD commented 2 weeks ago

Thanks Zach! I'm looking through these resources and trying to determine what's necessary and what would be overkill, given the audience.

I'm also looking through some of the resources this course has been assembling: https://github.com/Multiomics-Analytics-Group/course_protein_language_modeling

sterrettJD commented 2 weeks ago

In a similar vein, @casey-martin says:

Relevant topics:

context length (Mamba, Hyena, State Space Models)

multimodality (combined DNA/prot/RNA, seq-to-struct)

reinforcement learning (experimental and computational feedback)

How much background knowledge will the group have? Do they know what attention is? MLM vs autoregressive vs diffusion? Tokenizers?

I think we should probably assume that the group does not know what attention is, nor do they know about the different kinds of models or tokenizers. The primary audience is computational biologists - we'll have a good number of people from EBIO and MCDB at CU, but they aren't going go to be people who have already used language models. Once we get into the applications, I imagine we could have more CS-type folks joining.

I think we should probably have 2 seminars devoted to the basics (would be best to have someone from an NLP research group at CU come talk), then we jump into some genomic models and start to discuss drawbacks (e.g., why does context length matter) by layering in some of these concepts.

What do you two think?

sterrettJD / gpLM-reading-group

some curriculum suggestions #3