Name | Mike Izbicki (call me Mike) |
mizbicki@cmc.edu | |
Office | Adams 216 |
Webpage | izbicki.me |
Research | Machine Learning (see izbicki.me/research.html for some past projects) |
Fun facts:
What is big data?
Depends entirely on the person who is talking.
Who should take this course?
This course is designed for data science majors, not computer science majors. I'm happy to have CS majors in this course (and I think you'll find this course fun), but know that:
Prerequisites:
Discrete math: CSCI055 or MATH055
Foundations of data science: CSCI 036, ECON 122, or ECON 160
Data structures: CSCI046 or CSCI70 (Mudd) or CSCI62 (Pomona)
flask
libraryrequests
and bs4
librariesRelation to other CS courses:
One purpose of this course is to provide DS majors with an overview of CS concepts. Therefore, there is a lot of material in this course that is covered in other upper division CS courses required for CS majors.
Overlapping concepts
Concepts we don't cover from CSCI133 Databases
BigData concepts from a CS perspective that we will not talk about:
Textbook:
Big data is a rapidly changing field, and all currently printed textbooks are both incomplete and already out of date. Therefore, we won't be using a textbook. Instead, we will be using online documentation. The main references we will use are given below, but I will provide more specific links each week.
Assignments:
2**1
points)2**2
or 2**3
or 2**4
points)2**3
or 2**4
or 2**5
points)2**6
points each)
All assignments are explicitly designed to help you get a good job after graduation. They will help build your github "portfolio" and give you cool things to talk about during interviews. These assignments are all very practical, and not "leetcode" or "mathy".
You will receive extra credit for pull requests to this repo or any submodule.
Late Work Policy:
You lose 2**(i-1)
points on every assignment,
where i
is the number of days late.
Do not expect partial credit for incomplete assignments. It is much better to submit a correct assignment late than an incorrect one on time.
Grade Schedule:
Your final grade will be computed according to the following standard table, with the caveats described below.
If your grade satisfies | then you earn |
---|---|
95 ≤ grade | A |
90 ≤ grade < 95 | A- |
87 ≤ grade < 90 | B+ |
83 ≤ grade < 87 | B |
80 ≤ grade < 83 | B- |
77 ≤ grade < 80 | C+ |
73 ≤ grade < 77 | C |
70 ≤ grade < 73 | C- |
67 ≤ grade < 70 | D+ |
63 ≤ grade < 67 | D |
60 ≤ grade < 63 | D- |
60 > grade | F |
Caveats:
There are 2 "caveat tasks" in this course. These tasks should be easy, and everyone will get full credit on the task just for completing the task. If you don't complete one of the tasks, however, your grade (from the table above) will be docked 10%. (For example, an A- grade would become a B- grade.) You have the entire semester (until I submit grades) to complete these tasks.
You can find the details about the caveat tasks at:
Technology Policy:
You MAY use any AI tool without restriction.
You MUST complete all programming assignments on the lambda server.
You MUST use either vim or emacs for all text editing.
In particular, you MAY NOT use the GitHub text editor, VSCode, IDLE, or PyCharm for any reason.
You MAY NOT share your lambda server credentials with anyone else.
Collaboration Policy
There are no restrictions on what you can post to GitHub Issues. In particular, you are highly encouraged to post detailed questions/answers/comments with lots of code.
You are highly encouraged to collaborate with students
in class/lab,
in the QCL,
and in office hours.
I trust you all to be reasonable and ensure that collaboration is beneficial for your learning and not mere copying work.
You MAY NOT collaborate with students in any other context.
You MAY NOT look at another student's code on github.
All projects are developed as open source projects, and so the code is published openly online. The benefits of this model include: (1) you actually learn how to develop/contribute to open source projects; (2) future employers see you have github activity. Please do not abuse this privilege.
I've tried to design the course to be as accessible as possible for people with disabilities. (We'll talk a bit about how to design accessible software in class too!) If you need any further accommodations, please ask.
I want you to succeed and I'll make every effort to ensure that you can.