Open smangham opened 2 weeks ago
From my perspective, if a learner has never seen the shell before, let alone heard about scripting, then introducing shell scripts early would greatly increase the cognitive load for a learner. Without a mental model of the filesystem, as they are increasingly used to cloud based solutions where this is often abstracted away, writing scripts in early exercises would be a big ask in my experience. By using simple and abstract exercises, a learner doesn't have to focus on the data itself but can focus on running the commands and getting used to typing commands into the prompt and interpreting the output.
Similarly, the multiple choice exercises attempt to provide formative assessment for the instructor and learners.
Again, I feel this is the goal of the lesson is to introduce novice learners to what is often a completely alien environment, not to attempt to get them writing shell scripts from the outset.
Thanks for your feedback. A more coherent narrative could be helpful, though using data from a variety of fields is good because the software carpentry curriculum helps people develope software in a variety of fields. Shell scripts are useful, but transitioning to a command line editor takes a bit of time, and so would make learning more challenging. wc
and sort
are helpful in processing data. sed
would require more introduction to regular expressions.
Possibly relevant further reading is Data Science at the Command Line.
How could the content be improved?
The lesson's introduced, conceptually, as a realistic research project analysing data files. However it then almost immediately pivots into doing fairly abstract and arbitrary work on
thesis.txt
, extracts of Little Women, random gene sequences of fictional creatures... these files are then scattered in a bunch of subdirectories. The lesson makes very little use of the actual data.The exercises are also quite abstract, and heavily focus on multiple choices based on "Look at this example directory tree" - not making use of the actual directory trees in the data we have them download.
I think it'd flow a lot better if:
grep
to extract a particular ID/date/time of record from that filecut
to select a particular columnThere's a lot of use of
wc
,sort
,head -n
andtail -n
but I don't think they're that likely to be part of real pipelines. If selecting specific lines is required thensed -n
is the realistic option, whilsthead
andtail
should be introduced for their typical uses of peeking at files.