Open chendaniely opened 9 years ago
From the Campbell and Nehm paper: A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research [1]
important to emphasize that, in terms of education testing, "reliability" and "validity" do not refer to the properties of a test, but rather the inferences derived from the scores that the test produces... Tests themselves do not carry the properties of reliability or validity wherever and whenever they are used; rather, the contexts of test use, coupled with inferences about how the test functions to produce scores, are central aspects of validity and reliability.
A foundational source for established perspectives on validity and reliability is Educational Measurement.
To come up with an assessment, we would first have to develop questions for each of the categories we want to assess. We should follow the order of topics under validity first, that is mass brainstorm questions, debate on whether they seem to ask everything we want. We can run a series of pilots to make sure they are asking what we want to ask, and slowly test external validity and generalization.
pinging @chakwong87 since he will be helping us with the analysis and assessment creation
Aargh! So many confounding variables.
What if we offered a small incentive—stickers or something—for completion of a six-month post-workshop survey? That should lie even within our modest resources.
Some thoughts, based on (1) experience in running survey and interview studies in general, (2) almost a decade of teaching and tutoring intro programming at many levels, and (3) a series of interviews with 15 past SWC participants, primarily in the natural sciences, almost half female. I spoke to each 3 times over the course of several months (depending on schedules, but about 3-4 months in summer 2014) after they went to software carpentry. The interviews were about motivations, what they learned, how they incorporated it into their work, and what they felt the benefits of SWC to be.
Thought 1: Overall survey design should be iterative and aim for minimalism
Thought 2: Important not to conflate "using the skill" with "benefitting from the skill"
(Note that I think these are too many & too complicated, but this is my brain dump firt draft.)
What was your reason for attending the software carpentry event? (Check all that apply) [see footnote]
What is your technical background, prior to attending the software carpentry event? (Check all that apply) (Would be nice to brainstorm some more here, as well as improve wording and examples :)
After the software carpentry event, please check all the activities that you currently engage in as part of your typical workflow: (Same options as above)
Did you use version control prior to SWC? (Check all that apply)
After SWC, do you use version control? (Check all that apply) (Same answers as above)
Are there technical skills that you learned at the SWC bootcamp that you COULD use, if your work required it? Check all the reasons that apply to why you do NOT use those skills.
Please check all that apply. (I really dislike most of this wording, esp this “software skills” thing - ideas? I do want to include the whole range of skills that SWC covers, which is not just programming. IMO a good way to get it better would be to just ask people who represent the target demographic of this survey for feedback…)
Open-ended question: What do you think is the biggest impact, if any, positive or negative, did SWC have on the way that you do your work?
Where these questions came from:
Previously-identified areas for assessment: (1) attitudinal, (2) skill, (3) declaratory knowledge.
My revised areas for assessment:
Background information:
Some half-baked hypotheses off the top of my head:
Footnote: The options are based on the Aragon&Williams categories of collaborative creativity, which was a useful way to categorize motivations in the interviews so far. I’m not married to it, just something to start with. These are the “stages of collaborative creativity:" Focus – Trying to figure out where the group stands/represent Frame – Growing a group cohesion, create a “supportive affective” environment that encourage creativity Create – Open communication for building upon an idea Complete – When the emergent idea is evaluated, and, where appropriate, elaborated cycling back for as long as necessary into the create phase.
@uiuc-cse yes adjusting for various confounding variables may or may not be problematic. We definetly need to run the survey through numerous workshops before we get a large enough N to start adjusting for variables. Which ones did you have in mind?
@gvwilson do we need some kind of IRB approval if we ask questions about gender or any personal information?
@katerena what are your thoughts regarding likert questions? Did you find them being effective in your previous work?
@chendaniely on Likert questions: they are fine, as long as not too many. I find that when there are a lot of questions and a lot of things for participants to look at, they get distracted and start answering basically at random. There is a paper I recall from CHI09 or CHI10 about how Likert questions are terrible because some people tend to veer toward extreme answers and some people tend to veer toward mild answers, and so the extent of agreeing or disagreeing is more of an individual difference than a meaningful signal about the question. In practice, I think the fatigue/frustration effect is more pronounced: people drop out of the survey or just answer later likert questions with the same value, even if you mix up the questions. Likert questions make sense for really massive surveys and when you have well-tested likert instruments - surveys that someone designed to have the right amount of redundancy - so you end up trumping all those individual issues with the sheer force of N. I tend to think that a survey ought to be able to be interesting and meaningful with a few hundred responses, especially if the respondent demographic is really, really niche. Note that because I do so much interviews/observations, I am very sensitive to things like how the quality of human-subjects data depends on how that particular human is feeling and reacting to the question. I have generally found it useful to keep these concerns in mind for surveys as well.
tl;dr version - they are fine if there are maybe 5 or 10 of them, but I think if we want to have a survey that covers all the necessary topics while minimizing the amount of time needed to complete it, Likerts are not the best way to get there
We were actually thinking of creating a survey where we go through the EFA CFA and eventually IRT to make sure we have a consistent and valid survey with the fewest number of questions. But we first need a large pool of questions to start filtering down first.
Hey everyone,
Sorry for the late response.
I mentioned this to Daniel, but I figured that I keep everyone else in the loop as well...
For next semester, I don't think I'll be able to because of school-related stuff, but I should be able to around Spring 2016 - is it possible participate then?
Sincerely, Chak
On Mon, Jun 1, 2015 at 6:37 PM, Daniel Chen notifications@github.com wrote:
We were actually thinking of creating a survey where we go through the EFA CFA and eventually IRT to make sure we have a consistent and valid survey with the fewest number of questions. But we first need a large pool of questions to start filtering down first.
— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/assessment/issues/2#issuecomment-107738111 .
Lab-related comments is something I heard about a lot as well. The [probably less-common? don't know] flipside to PI pushback is overly-vague enthusiasm, like sending all lab members to SWC to learn things that the participants do not recognize as applicable to their work. In some cases, people were quite surprised to learn useful skills after all.
In any case, I strongly agree that asking a question about the social context / lab aspect is a good idea!
On Mon, Jun 8, 2015 at 7:48 PM, John Pellman notifications@github.com wrote:
As a former participant of a software carpentry workshop, I think that it would be good idea to gauge social network effects (I suppose this would fall under the 'attitudinal changes' category). Questions such as, "Do you think that other members of your lab would benefit from software carpentry?" or "How difficult do you believe it is to convince your supervisor to let more members of your lab attend software carpentry?"
My experience was that many people in my lab would have benefited from basic shell scripting skills, but did not attend SW because (in the fast-paced world of publish or perish) my PI was reluctant to reallocate time for training (as opposed to article writing) and was not adequately aware of the long-term value increased programming literacy would have for our lab.
Perhaps this is something that might be kept in mind for outreach as well- it would be nice for SW participants to bring back some sort of literature that might be used to persuade their PI to invest more resources in training.
--Potentially tangential 2 cents
— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/assessment/issues/2#issuecomment-110202978 .
Best, Katie
First pass of general outcomes we want to assess from our learners.
We can prune the list later, but this will be the basis for specific questions we will be asking in the post-workshop surveys
3 categories: