Closed gvwilson closed 5 months ago
Lot of already good stuff! I think I would start even "easier":
I wonder if any of these would fit in with where you're going:
tmux
or screen
to manage long-running commands on remote systems, using time
to accurately track the execution time of long running commands, and using commands like htop
to watch the memory and processor load of particular tasks. pip-audit
to scan for vulnerability might be a good compliment to the suggested material on virtual environments and requirements.txt
.Many of the topics I wanted to highlight are already covered.
Stuff that I think deserves some additional time:
system packagers vs pip (global and/or local pips) vs conda packaging: which to use, when to use them, and how to stop them from trampling one another
brief intro to the idea of reproducible computing. Explain why it's not attainable (in real life circumstances) at the present moment, and how to get closer to the ideal state without going overboard.
using the 80/20 Pareto principle on system administration. Prioritizing on the 20% that is needed to achieve 80% results is good enough.
Side topic
Focusing specifically on "data science sys admins" I'd want to cover the following:
That's all that comes to mind right now, but I'm sure I've missed lots.
Suppose you had a room full of data scientists for a day. What would you teach them about systems administration, package management, dev ops, and everything else they need to know to get from "works for me on my machine" to "works for everyone else a year from now"?
Sean Aubin:
Kristine Willis
tyx
nikkid
Mike Spencer
Miles McBain
Lluís Revilla
Noam Ross
Jarek Bryk