Name | Mike Izbicki (call me Mike) |
Office | Adams 216 |
Office Hours | See Issue #150 |
Zoom | See Issue #148 |
Webpage | https://izbicki.me |
Research | Machine Learning (see izbicki.me/research.html for some past projects) |
Fun facts:
General Information:
Course Content:
See the introduction Jupyter notebook.
This course is divided into three sections:
Pagerank - We will use the paper "Deeper Inside Pagerank".
You will learn:
Machine Learning - We will use the textbook Learning from Data by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. (I will provide you a free copy to keep.)
You will learn:
Applications - We will use a variety of papers/blog posts. Focus on text/web/social media examples.
Other Topics:
Ethical implications of data mining
Pet peeve: You can't understand the ethics if you don't understand the technical details.
Apply data mining libraries (PyTorch, scikit-learn, GenSim, spaCy, etc.)
Prerequisite knowledge:
Grades:
Category | Points | |
---|---|---|
Projects | 2**2 -2**3 |
between 4-7 projects |
Quizzes | 2**2 -2**3 |
as needed |
Midterm 1 (Pagerank) | 2**5 |
(approximately) 23 Sep |
Midterm 2 (Machine Learning) | 2**6 |
(approximately) 01 Nov |
Final | 2**7 |
Projects:
You lose 2**(i-1)
points on every assignment,
where i
is the number of days late.
Do not expect partial credit for incomplete assignments. It is much better to submit a correct assignment late than an incorrect one on time.
I expect most people to get full credit on the assignments.
Exams:
This is a Hard Class:
The material is intrinsically hard.
There's a reason people who understand this material get paid big salaries at FAANG.
What do data scientists get paid?
You will have to read the required references.
Not all the material will be covered in lectures, and that's intentional to force you to get practice reading research-level data mining texts.
NOTE: In all of my other courses, I include required reading/watching tasks to learn about CS/DS culture. This course doesn't have these tasks because there is already a LOT of textbook reading that you will have to complete.
Comments from previous students:
Holy fucking shit this was a hard class. I had no idea there was so much god damned fucking math involved in a CS class. You should warn students about that.
I got my job because of the data mining course. Technical interviews were super easy because of this class.
Unfortunately, I can't remove the math from this class, and I can't make the class easier. Otherwise, you wouldn't be learning the material needed to pass a technical interview / get a good job / go to grad school.
Collaboration Policy:
You are encouraged to discuss all projects with other students, subject to the following constraints:
You may use any online resources you like (including ChatGPT) without restriction.
Basically, I'm trusting you all to be adults. You are ultimately responsible for ensuring you learn the material! So do what will help you learn best.
I've tried to design the course to be as accessible as possible for all students. If you need any further accommodations---even if you don't have an officially recognized disability---please ask.
I want you to succeed and I'll make every effort to ensure that you can.